Commit Graph

85 Commits

Author SHA1 Message Date
Geza Lore
b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Geza Lore
39d903375b
Factor out trace implementation common to all formats. (#2268)
This patch de-duplicates common functionality between the VCD and FST
trace implementation. It also enables adding new trace formats more
easily and consistently.

No functional nor performance change intended.
2020-04-19 23:57:36 +01:00
Geza Lore
6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Geza Lore
74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder
d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Geza Lore
dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Nathan Myers
4c1ae4701a
Add assertion for monotonic dump times #2103 (#2237) 2020-04-09 19:00:27 -04:00
Geza Lore
991d8b178b
Fix FST tracing performance by removing std::map from hot path. (#2244)
This patch eliminates a major piece of inefficiency in FST tracing
support, by using an array to lookup fstHandle values corresponding
to trace codes, instead of a tree based std::map. With this change, FST
tracing is now only about 3x slower than VCD tracing. We do require
more memory to store the symbol lookup table, but the size of that is
still small, for the speed benefit.
2020-04-08 17:54:35 -04:00
Wilson Snyder
e07e9390f6 Internals: clang-format cleanups. No functional change. 2020-04-04 14:09:21 -04:00
Wilson Snyder
1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder
6c6d70a5e5 Fix FST when multi-model tracing. 2020-03-07 18:39:58 -05:00
Wilson Snyder
30a33a6104 Add support for and , #2126. 2020-03-01 21:39:23 -05:00
Wilson Snyder
f7bad37e88 Fix GCC 4.4.7 errors. 2020-02-08 07:09:41 -05:00
Wilson Snyder
9978cbfa5c Fix tracing -1 index arrays. Closes #2090. 2020-01-08 07:32:31 -05:00
Wilson Snyder
f23fe8fd84 Update copyright year. 2020-01-06 18:05:53 -05:00
Wilson Snyder
521418d832 Update FST trace API for better performance. 2019-12-10 18:55:09 -05:00
Wilson Snyder
700f2072c0 Framework for WDatas being vectors of 64-bit EDatas, but not supporting this at this time. 2019-12-08 21:36:38 -05:00
Wilson Snyder
66727670f9 Override gtkwave config in upstreamable way. No functional change. 2019-11-16 03:54:05 -05:00
Wilson Snyder
f87107e757 Tests etc: Cleanup some clang-format suggestions. No functional change. 2019-11-09 20:35:12 -05:00
Wilson Snyder
7311c84937 Slightly faster FST tracing. 2019-10-18 19:48:27 -04:00
Wilson Snyder
6ffbb7cabf Internals: Remove extra semicolons. No functional change. 2019-06-11 18:31:06 -04:00
Wilson Snyder
96c70ea2df Internals: Fix some long lines in include files. No functional change.
When merging, recommend using "git merge -Xignore-all-space"
2019-05-14 22:49:32 -04:00
Wilson Snyder
55a25674a2 Add --trace-fst-thread. 2019-05-02 20:33:05 -04:00
Wilson Snyder
6b3304320b For FST tracing use LZ4 compression. 2019-05-02 19:55:16 -04:00
Wilson Snyder
8a4aeddbb0 Copyright year update. 2019-01-03 19:17:22 -05:00
Wilson Snyder
47107a5a36 Fix FST tracing of wide arrays, bug1376. 2018-12-18 20:49:44 -05:00
Wilson Snyder
304a24d03a Internals: Fix many clang-tidy issues. No functional change intended. 2018-10-14 18:39:33 -04:00
Wilson Snyder
d87b9d25ca Internals: Cleanup and standardize include order. No functional change intended. 2018-10-14 13:59:40 -04:00
Wilson Snyder
595419b370 Internals: Sort includes for clang-tidy. No functional change intended. 2018-10-14 07:04:18 -04:00
Wilson Snyder
cc45a3dd72 For --trace-fst, save enum decoding information, bug1358. 2018-10-08 07:21:22 -04:00
Wilson Snyder
9ad252be34 For --trace-fst, instead of *.fst.hier, put data into *.fst. 2018-10-07 20:47:26 -04:00
Wilson Snyder
d11592cadd Support signal types in FST dumps, bug1358. 2018-10-04 20:24:41 -04:00
Wilson Snyder
8ef9ac7dba Support in/out directions in FST dumps, bug1358. 2018-10-03 19:51:05 -04:00
Sergi Granell
a5aa0e2b0a Add GTKWave FST native tracing, bug1356.
Signed-off-by: Wilson Snyder <wsnyder@wsnyder.org>
2018-10-02 18:42:53 -04:00