Commit Graph

121 Commits

Author SHA1 Message Date
Geza Lore
b51f887567
Perform VCD tracing in parallel when using --threads (#3449)
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.

This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.

This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
2022-05-29 19:08:39 +01:00
Geza Lore
a48c779367 Rename verilated_trace_imp.cpp -> verilated_trace_imp.h
Also fix file header to describe purpose of this file.
2022-05-28 12:20:35 +01:00
Geza Lore
d45caca011 Remove legacy VCD tracing API
This has not been used by Verilator for a while, but was kept for
compatibility with some external code. Now removed.
2022-05-28 12:07:24 +01:00
Geza Lore
551bd284dd Rename some internals related to multi-threaded tracing
Rename the implementation internals of current multi-threaded tracing to
be "offload mode". No functional change, nor user interface change
intended.
2022-05-20 16:44:35 +01:00
Wilson Snyder
e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Wilson Snyder
321880f5a6 Add trace dumpvars() call for selective runtime tracing (#3322). 2022-03-05 15:44:32 -05:00
Wilson Snyder
50094ca296 Internals: Add cpplint control file and related cleanups 2022-01-09 16:49:38 -05:00
Wilson Snyder
4cd56b1fb9 Use C++11 standard types for MacOS portability (#3254) (#3257). 2022-01-01 16:04:20 -05:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Geza Lore
ff425369ac
Reduce .rodata footprint of trace initialization (#3250)
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.

This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.

Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
2021-12-19 15:15:07 +00:00
Wilson Snyder
e87a726989 Internals: More const. No functional change intended. 2021-11-25 09:05:50 -05:00
Pieter Kapsenberg
d1836b7b6f
Traces show array instances using brackets instead of parens (#3092) (#3095) 2021-08-12 20:40:44 +03:00
Wilson Snyder
b8e804f05b Internals: Some clang-tidy cleanups. No functional change intended. 2021-07-25 13:38:27 -04:00
Wilson Snyder
ab13a2ebdc Internals: Use C++11 const and initializers. No functional change intended. 2021-07-24 08:36:11 -04:00
Wilson Snyder
52cde49a6f Internals: Add more const. No functional change. 2021-06-18 22:24:08 -04:00
Yutetsu TAKATSUKASA
39ce2f77f4
Internals: Remove VerilatedVcd::m_suffixesp. No functional change is intended. (#2933)
Reasons are:
- it's error prone to keep updating whennever m_suffixes is resized
- invalid pointer may be set when there is not signal to trace as in t_trace_dumporder_bad
2021-05-07 20:45:58 -04:00
Yutetsu TAKATSUKASA
9797af0ad4
Introduce a macro VL_ATTR_NO_SANITIZE_ALIGN to suppress unaligned access check in ubsan (#2929)
* Add VL_ATTR_NO_SANITIZE_ALIGN macro to disable alignment check of ubsan

* Mark a function VL_ATTR_NO_SANITIZE_ALIGN because the function is intentionally using unaligned access for the sake of performance.

Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
2021-05-08 07:16:40 +09:00
Àlex Torregrosa
2b2680770b
Improve scope types in FST and VCD traces (#2805). 2021-04-07 09:55:11 -04:00
Wilson Snyder
ca01d6f18d Internals: Add some std::'s. No functional change intended. 2021-03-26 21:23:18 -04:00
Wilson Snyder
2e158d88c1 Commentary. Remove dox comments from private members, 2021-03-20 21:11:53 -04:00
Wilson Snyder
a1ab295b74 Commentary: Cleanup all include/* header comments. 2021-03-20 17:46:00 -04:00
Wilson Snyder
9483ebefae Internal code coverage cleanups. 2021-03-07 21:05:15 -05:00
Wilson Snyder
2cad22a22a
Add simulation context (VerilatedContext) (#2660). (#2813)
**   Add simulation context (VerilatedContext) to allow multiple fully independent
      models to be in the same process.  Please see the updated examples.
**   Add context->time() and context->timeInc() API calls, to set simulation time.
      These now are recommended in place of the legacy sc_time_stamp().
2021-03-07 11:01:54 -05:00
Wilson Snyder
caa9c99837 Commentary 2021-03-07 08:28:13 -05:00
Wilson Snyder
47dcbd4b8a Internal: Remove deprecated/insecure functions. No functional change intended. 2021-03-06 10:34:03 -05:00
Wilson Snyder
9650aefa42 Internals: Cleanup unneeded {}. No functional change 2021-02-21 21:25:21 -05:00
github action
af5fc4f1ad Apply clang-format 2021-02-03 19:41:25 +00:00
Àlex Torregrosa
e77e4e1fe6
Improve struct scopes when dumping structs to VCD (#2776) 2021-02-03 14:40:21 -05:00
Wilson Snyder
bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder
c39a8b439a Internals: Use emplace instead of insert(make_pair(...)). No functional change intended. 2020-12-18 18:24:47 -05:00
Yutetsu TAKATSUKASA
ce9293fcb3
Fix memory leaks found in trace related tests (#2708)
* Fix memory leak of t_trace_cat and t_trace_cat_renew

* Fix memory leak of t_trace_c_api

* Fix memory leak in t_trace_public_func and t_trace_public_func_vlt

* Fix memory leaks in t_flat_build (and probably more).

* Use unique_ptr in testcases
2020-12-17 08:31:47 +09:00
Wilson Snyder
7d05be802d Misc internal coverage hole and related bug fixes 2020-12-09 19:18:12 -05:00
Wilson Snyder
6013b54f7b clang-tidy cleanups. No functional change intended. 2020-08-16 14:55:46 -04:00
Wilson Snyder
ee9d6dd63f C++11: Favor auto, range for. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
72d2cff0a1 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Geza Lore
fac89c5d62
Close trace on vl_fatal/vl_finish (#2414)
This is required to get the last bit of FST trace and close the FST file
properly on $stop or assertion failure.
2020-06-12 07:15:42 +01:00
Geza Lore
95534fa5c5
Remove unused headers (#2389) 2020-05-31 20:21:07 +01:00
Wilson Snyder
5089ac6119 Remove VL_ULL as ULL now in MSVC & C++11 2020-05-28 20:32:07 -04:00
Wilson Snyder
773ed97504 Internals Most VerilatedLockGuard can be const. No functional change intended. 2020-05-28 18:23:46 -04:00
Wilson Snyder
6a882f9dc6 Internal code coverage improvements. No functional change intended. 2020-05-23 10:34:58 -04:00
Wilson Snyder
c4f31d3bb6 Tracing: Remove dead code. No functional change intended. 2020-05-17 09:52:03 -04:00
Wilson Snyder
6fd7f45cef Internals: Remove dead needHInlines code 2020-05-16 07:53:27 -04:00
Wilson Snyder
57a937df03 Misc internal coverage cleanups 2020-05-16 07:43:22 -04:00
Wilson Snyder
35a53d9adb Add t_trace_c_api test. 2020-05-15 20:38:08 -04:00
Geza Lore
aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Geza Lore
b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Geza Lore
9991b19610 Another attempt at flushing threaded VCD correctly. 2020-04-25 18:40:09 +01:00
Geza Lore
c1665818b9
Fix missing flush with threaded VCD tracing. (#2282)
VerilatedVcdC::openNext() failed to flush the tracing thread before
opening the next output file, which caused t_trace_cat.pl to fail
with --vltmt on occasion.
2020-04-24 03:09:26 +01:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00