Commit Graph

1608 Commits

Author SHA1 Message Date
Geza Lore
aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Wilson Snyder
b44efe7ef7 Use 'suggest' for consistent wording. 2020-04-28 21:19:19 -04:00
Wilson Snyder
15ad3f46be Fix logical not optimization with empty begin, #2291. 2020-04-28 21:15:20 -04:00
Wilson Snyder
910803e6db Fix error on unpacked connecting to packed, #2288. 2020-04-27 18:38:54 -04:00
Wilson Snyder
87e1c36e4a Support event data type (with some restrictions). 2020-04-25 15:37:46 -04:00
Wilson Snyder
3b37b5b92d Tests: Check output from some unsupported tests. 2020-04-24 08:22:19 -04:00
Geza Lore
10b4678ee6 Make vgen.pl deterministic 2020-04-24 04:53:33 +01:00
Geza Lore
27f4399c31
Fix tests failing on rerun after passing from clean. (#2281) 2020-04-23 21:27:06 -04:00
Wilson Snyder
f93ae707e0 Tests: Add bad option test. 2020-04-23 19:56:26 -04:00
Geza Lore
8208fe8a0e
Fix test failures on Ubuntu 20.04 (#2278)
- Packaged SystemC lives in /usr so needed to update regex in test
driver
- Clang 10 complains about mixed named and positional initializers in
struct definitions.
2020-04-23 17:29:37 -04:00
Wilson Snyder
ace35b3e81 Tests: Add -G test. 2020-04-23 08:05:14 -04:00
Wilson Snyder
2b58e834ee Tests: Rename IVERILOG define for consistency. No functional change. 2020-04-23 08:05:14 -04:00
Wilson Snyder
7176aee852 Internals: Parse fork and delays, but then still report unsupported. 2020-04-22 21:31:40 -04:00
Wilson Snyder
77915f78db Add experimental-only option. 2020-04-21 20:45:23 -04:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
James Hanlon
97cbc10925 Add --flaten for use with --xml-only (#2270). 2020-04-21 18:14:08 -04:00
Wilson Snyder
174fd1bf0e Codacy cleanups. No functional change. 2020-04-20 22:01:47 -04:00
Wilson Snyder
b12413e42f Tests: Reenable some tests incorrectly marked unsupported. 2020-04-20 21:55:23 -04:00
Wilson Snyder
15f7685755 Codacity cleanups. No functional change intended. 2020-04-20 21:43:05 -04:00
Wilson Snyder
fceedd9f4d Tests: Update static test. 2020-04-19 21:18:57 -04:00
Wilson Snyder
4272f2116e Tests: Update static test. 2020-04-19 20:10:07 -04:00
Geza Lore
6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Wilson Snyder
466535abdc Support direct class member init. 2020-04-18 20:20:17 -04:00
Geza Lore
efacac2e3d
Tests: Ignore SystemC file paths in expected test results (#2265) 2020-04-18 18:56:19 -04:00
Geza Lore
74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder
39d7cbf412 Fix arrayed instances connecting to slices, #2263. 2020-04-17 19:30:53 -04:00
Wilson Snyder
8f7e463656 Tests: Fix makeflag test, was failing older makes. 2020-04-16 17:31:41 -04:00
Wilson Snyder
d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Wilson Snyder
58091edd68 Tests: Fix cmake -j unknown 2020-04-15 18:08:31 -04:00
Yutetsu TAKATSUKASA
18412f9322
Add --build option to call make/cmake as subprocess (#2249)
* Add --build, -j, -MAKEFLAGS, and --no-verilate options
* Verilator: Can build on both gmake and cmake
2020-04-15 17:44:21 -04:00
Geza Lore
1a64c7d232
Fix run-time formatting of variable wider than 1023 bits (#2261) 2020-04-15 17:26:15 -04:00
Geza Lore
08b74e5ab9
Fix crash when formatting constant wider than 1023 bits (#2260) 2020-04-14 18:07:09 -04:00
Geza Lore
dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Wilson Snyder
dba88bae3c Support class new. 2020-04-12 18:57:12 -04:00
Wilson Snyder
ea3acc2d3a Fix --skip-identical broke recent commit. 2020-04-11 20:22:57 -04:00
Wilson Snyder
8e6674066f Tests: Clean before rerunning failing test. 2020-04-11 11:40:15 -04:00
Wilson Snyder
15b40a97d9 Support `unconnected_drive 2020-04-09 23:26:03 -04:00
Wilson Snyder
608d5a87d1 tests: Avoid assuming a timescale. 2020-04-07 20:55:47 -04:00
Geza Lore
0cfa828572 Fix DPI import/export to be standard compliant, #2236. 2020-04-07 19:07:47 -04:00
Wilson Snyder
b6c21ad21a Fix duplicate traces with $dumpfile, part of #2237. 2020-04-06 08:33:51 -04:00
Wilson Snyder
26301a4133 Commentary 2020-04-06 08:19:32 -04:00
Wilson Snyder
383f9832d4 Tests: Standardize verilog indentation. 2020-04-05 21:53:24 -04:00
Wilson Snyder
a331397954 Fix real conversion from constant with X/Z. 2020-04-05 11:56:15 -04:00
Wilson Snyder
a494ad5ec7 Support $ferror, #1638. 2020-04-05 11:22:05 -04:00
Wilson Snyder
e55338f927 Support $fflush without arguments, #1638. 2020-04-05 10:11:28 -04:00
Wilson Snyder
6eadb8e771 Add simplistic class support with many restrictions, see manual, #377. 2020-04-05 09:30:23 -04:00
Wilson Snyder
bf17bb4648 Fix codacity warnings 2020-04-04 20:08:58 -04:00
Wilson Snyder
c288a7bfb9 Add GCC10-style line number prefix when showing source text for errors. 2020-04-03 20:07:46 -04:00
Marco Widmer
7f9aa057bf
Support split_var in vit files (#2219) 2020-04-03 08:08:23 -04:00
Wilson Snyder
dcde026bac With --Wpedantic, report forward typedefs that are unused. 2020-04-02 07:39:14 -04:00