Commit Graph

75 Commits

Author SHA1 Message Date
Larry Doolittle
2ab70ba452
Internals: Cleanup .txt file whitespace (#3842) 2023-01-05 05:00:54 -05:00
Wilson Snyder
b24d7c83d3 Copyright year update 2023-01-01 10:18:39 -05:00
github action
821dd070bf Apply 'make format' 2022-11-23 09:08:02 +00:00
Yves Mathieu
06fdf7be58
Add support of Events for VCD/FST traces (#3759) 2022-11-23 04:07:14 -05:00
Wilson Snyder
8c6d1e53ca Internals: Fix some 'p' names, and make new base class for VlDeleter. No functional change intended. 2022-11-13 17:40:50 -05:00
Kamil Rakoczy
d6126c4b32
Remove --no-threads; require --threads 1 for single threaded (#3703). 2022-11-05 08:47:34 -04:00
Wilson Snyder
a2d26b45bb Internals: Fix some clang-tidy issues. No functional change intended. 2022-07-30 11:54:28 -04:00
Geza Lore
1d400dd98c
Configure tracing at run-time, instead of compile time (#3504)
All remaining use of conditional compilation in the tracing
implementation of the run-time library are replaced with the use of
VerilatedModel::traceConfig, and is now done at run-time.
2022-07-20 11:27:10 +01:00
Geza Lore
f8b7981be4 Make use of FST writer thread switchable at run-time.
Always build the FST libray with -DFST_WRITER_PARALLEL, iff VL_THREADED.
This supports run-time enablement of the FST writer thread, and has no
measurable performance impact on single threaded tracing but simplifies
the library build.

Note: the actual choice of using the fst writer thread is still compile
time, but can now be made run-time easily.
2022-07-19 13:48:03 +01:00
Geza Lore
db59c07f27 Implement trace offloading with fewer ifdefs
Step towards a proper run-time library. Reduce the amount of ifdefs in
the implementation of offloaded tracing. There are still a very small
number of ifdefs left, which will need more careful changes in order to
keep user API compatibility.
2022-07-19 11:31:35 +01:00
Geza Lore
b51f887567
Perform VCD tracing in parallel when using --threads (#3449)
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.

This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.

This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
2022-05-29 19:08:39 +01:00
Geza Lore
a48c779367 Rename verilated_trace_imp.cpp -> verilated_trace_imp.h
Also fix file header to describe purpose of this file.
2022-05-28 12:20:35 +01:00
Wilson Snyder
e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Wilson Snyder
321880f5a6 Add trace dumpvars() call for selective runtime tracing (#3322). 2022-03-05 15:44:32 -05:00
Jamie Iles
b6ca2a42f2
Fix FST traces to include vector range (#3296) (#3297) 2022-02-26 12:52:24 -05:00
Wilson Snyder
50094ca296 Internals: Add cpplint control file and related cleanups 2022-01-09 16:49:38 -05:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Geza Lore
ff425369ac
Reduce .rodata footprint of trace initialization (#3250)
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.

This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.

Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
2021-12-19 15:15:07 +00:00
Pieter Kapsenberg
d1836b7b6f
Traces show array instances using brackets instead of parens (#3092) (#3095) 2021-08-12 20:40:44 +03:00
Wilson Snyder
b8e804f05b Internals: Some clang-tidy cleanups. No functional change intended. 2021-07-25 13:38:27 -04:00
Wilson Snyder
ab13a2ebdc Internals: Use C++11 const and initializers. No functional change intended. 2021-07-24 08:36:11 -04:00
Wilson Snyder
52cde49a6f Internals: Add more const. No functional change. 2021-06-18 22:24:08 -04:00
David Metz
f5ad5cf034
Fix dumping waveforms to multiple FST files (#2889) 2021-04-14 16:52:14 -04:00
github action
52fc134272 Apply clang-format 2021-04-07 13:56:12 +00:00
Àlex Torregrosa
2b2680770b
Improve scope types in FST and VCD traces (#2805). 2021-04-07 09:55:11 -04:00
Wilson Snyder
a1ab295b74 Commentary: Cleanup all include/* header comments. 2021-03-20 17:46:00 -04:00
Wilson Snyder
2cad22a22a
Add simulation context (VerilatedContext) (#2660). (#2813)
**   Add simulation context (VerilatedContext) to allow multiple fully independent
      models to be in the same process.  Please see the updated examples.
**   Add context->time() and context->timeInc() API calls, to set simulation time.
      These now are recommended in place of the legacy sc_time_stamp().
2021-03-07 11:01:54 -05:00
Wilson Snyder
caa9c99837 Commentary 2021-03-07 08:28:13 -05:00
Wilson Snyder
9650aefa42 Internals: Cleanup unneeded {}. No functional change 2021-02-21 21:25:21 -05:00
Wilson Snyder
bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder
ee9d6dd63f C++11: Favor auto, range for. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
72d2cff0a1 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Geza Lore
378d947702 Travis: Add FreeBSD build + portability fixes 2020-06-28 15:37:24 +01:00
Wilson Snyder
6ce878cb0d Fix some clang-tidy warnings 2020-06-01 23:16:17 -04:00
Geza Lore
95534fa5c5
Remove unused headers (#2389) 2020-05-31 20:21:07 +01:00
Wilson Snyder
c4f31d3bb6 Tracing: Remove dead code. No functional change intended. 2020-05-17 09:52:03 -04:00
Wilson Snyder
29bcbb0417 Suppress impossible code coverage issues 2020-05-15 22:34:29 -04:00
Geza Lore
8afcd67a1f Fix FST tracing of little endian vectors 2020-05-03 22:39:45 +01:00
Geza Lore
aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Geza Lore
b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Geza Lore
39d903375b
Factor out trace implementation common to all formats. (#2268)
This patch de-duplicates common functionality between the VCD and FST
trace implementation. It also enables adding new trace formats more
easily and consistently.

No functional nor performance change intended.
2020-04-19 23:57:36 +01:00
Geza Lore
6a54922044
Set FST timescale correctly. (#2266)
The FST trace timescale used to be set in the constructor via
set_time_unit, but at that point we haven't normally opened the
file yet so it was just dropped. On top of that, we actually want
to use set_time_resolution... FST trace timescales now match the VCD.
2020-04-19 08:47:22 -04:00
Geza Lore
74e16d85c5
Fix FST trace initial time stamp. (#2264)
If the first dump was not at time zero, then the FST trace used
to contain the initial values as if they were set at time zero. Now
they only appear at the time the first dump call is actually made,
and hence match the VCD trace exactly.
2020-04-18 18:54:02 -04:00
Wilson Snyder
d4f7f5297a
Support IEEE time units and time precisions, #234. (#2253)
Includes `timescale, $printtimescale, $timeformat.
VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed
and the time precision must now match the SystemC time precision.
To get closer behavior to older versions, use e.g. --timescale-override
"1ps/1ps".
2020-04-15 19:39:03 -04:00
Geza Lore
dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
Nathan Myers
4c1ae4701a
Add assertion for monotonic dump times #2103 (#2237) 2020-04-09 19:00:27 -04:00
Geza Lore
991d8b178b
Fix FST tracing performance by removing std::map from hot path. (#2244)
This patch eliminates a major piece of inefficiency in FST tracing
support, by using an array to lookup fstHandle values corresponding
to trace codes, instead of a tree based std::map. With this change, FST
tracing is now only about 3x slower than VCD tracing. We do require
more memory to store the symbol lookup table, but the size of that is
still small, for the speed benefit.
2020-04-08 17:54:35 -04:00
Wilson Snyder
e07e9390f6 Internals: clang-format cleanups. No functional change. 2020-04-04 14:09:21 -04:00