Commit Graph

3442 Commits

Author SHA1 Message Date
Geza Lore
c96a43b452
Fix unused variable in VL_READMEM_N (#2274) 2020-04-22 17:25:35 -04:00
Wilson Snyder
77915f78db Add experimental-only option. 2020-04-21 20:45:23 -04:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Wilson Snyder
6ab51de96d Fix dockerfile 2020-04-21 18:37:53 -04:00
James Hanlon
97cbc10925 Add --flaten for use with --xml-only (#2270). 2020-04-21 18:14:08 -04:00
James Hanlon
65cd4f6047 Fix comment and add to CONTRIBUTORS (#2270). 2020-04-21 18:11:53 -04:00
Wilson Snyder
174fd1bf0e Codacy cleanups. No functional change. 2020-04-20 22:01:47 -04:00
Wilson Snyder
b12413e42f Tests: Reenable some tests incorrectly marked unsupported. 2020-04-20 21:55:23 -04:00
Wilson Snyder
7709130d93 Fix Codacy badge 2020-04-20 21:52:29 -04:00
Wilson Snyder
15f7685755 Codacity cleanups. No functional change intended. 2020-04-20 21:43:05 -04:00
Wilson Snyder
83c6e9e821 Commentary commit for Codacity. 2020-04-20 21:13:43 -04:00
Veripool API Bot
1cacb1deab Commentary commit for Codacity. 2020-04-20 20:01:59 -04:00
Veripool API Bot
03bc8b7480 Commentary commit for Codacity. 2020-04-20 19:54:07 -04:00
Veripool API Bot
7d6668a3bd Commentary commit for Codacity. 2020-04-20 19:38:21 -04:00
Wilson Snyder
def40fab9b Internals: Rename VSigning 2020-04-19 21:19:09 -04:00
Wilson Snyder
fceedd9f4d Tests: Update static test. 2020-04-19 21:18:57 -04:00
Wilson Snyder
4272f2116e Tests: Update static test. 2020-04-19 20:10:07 -04:00
Geza Lore
39d903375b
Factor out trace implementation common to all formats. (#2268)
This patch de-duplicates common functionality between the VCD and FST
trace implementation. It also enables adding new trace formats more
easily and consistently.

No functional nor performance change intended.
2020-04-19 23:57:36 +01:00
Wilson Snyder
9164eb03d5 Show that class parameters even if unused are unsupported. 2020-04-19 18:36:55 -04:00
Wilson Snyder
7b789fe02a Docker: Add ccache and libgoogle-perftools-dev 2020-04-19 12:59:38 -04:00
Wilson Snyder
4ae3d3af71 Fix docker build error 2020-04-19 12:43:20 -04:00
Geza Lore
6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Wilson Snyder
466535abdc Support direct class member init. 2020-04-18 20:20:17 -04:00
Geza Lore
efacac2e3d
Tests: Ignore SystemC file paths in expected test results (#2265) 2020-04-18 18:56:19 -04:00
Geza Lore
74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder
39d7cbf412 Fix arrayed instances connecting to slices, #2263. 2020-04-17 19:30:53 -04:00
Wilson Snyder
8f7e463656 Tests: Fix makeflag test, was failing older makes. 2020-04-16 17:31:41 -04:00
Wilson Snyder
e6f345e45d Internal: clang-tidy fixes. No functional change. 2020-04-15 21:47:37 -04:00
Wilson Snyder
d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Wilson Snyder
9b8aebb00c Commentary on --build 2020-04-15 18:08:37 -04:00
Wilson Snyder
58091edd68 Tests: Fix cmake -j unknown 2020-04-15 18:08:31 -04:00
Yutetsu TAKATSUKASA
18412f9322
Add --build option to call make/cmake as subprocess (#2249)
* Add --build, -j, -MAKEFLAGS, and --no-verilate options
* Verilator: Can build on both gmake and cmake
2020-04-15 17:44:21 -04:00
Wilson Snyder
1883ab29cb clang-format 10.0 forward compatibility. No functional change. 2020-04-15 17:36:57 -04:00
Geza Lore
1a64c7d232
Fix run-time formatting of variable wider than 1023 bits (#2261) 2020-04-15 17:26:15 -04:00
Wilson Snyder
f3308d236b clang-format remaining sources. No functional change. 2020-04-15 07:58:34 -04:00
Wilson Snyder
1b94e3b0e2 Internals: clang-format files needed for #2249. 2020-04-14 19:55:00 -04:00
Geza Lore
08b74e5ab9
Fix crash when formatting constant wider than 1023 bits (#2260) 2020-04-14 18:07:09 -04:00
Wilson Snyder
5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Geza Lore
dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Wilson Snyder
dc27a179e2 Always define VL_SIG etc; conditional definitions were long removed SystemPerl. 2020-04-13 19:07:56 -04:00
Wilson Snyder
236e6baa76 clang-format: Loops allowed on single line. 2020-04-13 17:44:19 -04:00
Wilson Snyder
dba88bae3c Support class new. 2020-04-12 18:57:12 -04:00
Wilson Snyder
d4b6e2b2b5 Internals: NodeModule for packages. 2020-04-12 14:53:10 -04:00
Wilson Snyder
1e2d73fc80 Internals: clang-format and refactor taskref pin handling. 2020-04-12 08:26:14 -04:00
Wilson Snyder
ea3acc2d3a Fix --skip-identical broke recent commit. 2020-04-11 20:22:57 -04:00
Nathan Kohagen
152505e879 Fix make install/uninstall for examples/xml_py, #2252. 2020-04-11 18:11:53 -04:00
Wilson Snyder
8e6674066f Tests: Clean before rerunning failing test. 2020-04-11 11:40:15 -04:00
Geza Lore
8b2666cd04
Fix to make trace code allocation dense. (#2250)
This looks like a bits/bytes bug. The affected m_codeInc member
determines how many 32-bit words to allocate in a buffer used to store
previous values of the signal, but this was off by a factor of 8, so
we used to use too much memory.

SweRV VCD tracing speed +6.5% (excluding IO, clang 6.0), due mainly to
reduced D cache misses.
2020-04-11 16:00:43 +01:00
Wilson Snyder
afa8e4c786 Internals: Favor const_iterator. No functional change. 2020-04-11 10:54:42 -04:00
Wilson Snyder
ef211fc9e0 Make sure SystemC always included in -sc mode to prevent ordering issues. 2020-04-11 10:33:40 -04:00