verilator

Author	SHA1	Message	Date
Larry Doolittle	2ab70ba452	Internals: Cleanup .txt file whitespace (#3842 )	2023-01-05 05:00:54 -05:00
Wilson Snyder	b24d7c83d3	Copyright year update	2023-01-01 10:18:39 -05:00
github action	821dd070bf	Apply 'make format'	2022-11-23 09:08:02 +00:00
Yves Mathieu	06fdf7be58	Add support of Events for VCD/FST traces (#3759 )	2022-11-23 04:07:14 -05:00
Wilson Snyder	8c6d1e53ca	Internals: Fix some 'p' names, and make new base class for VlDeleter. No functional change intended.	2022-11-13 17:40:50 -05:00
Kamil Rakoczy	d6126c4b32	Remove --no-threads; require --threads 1 for single threaded (#3703 ).	2022-11-05 08:47:34 -04:00
Wilson Snyder	a2d26b45bb	Internals: Fix some clang-tidy issues. No functional change intended.	2022-07-30 11:54:28 -04:00
Geza Lore	1d400dd98c	Configure tracing at run-time, instead of compile time (#3504 ) All remaining use of conditional compilation in the tracing implementation of the run-time library are replaced with the use of VerilatedModel::traceConfig, and is now done at run-time.	2022-07-20 11:27:10 +01:00
Geza Lore	f8b7981be4	Make use of FST writer thread switchable at run-time. Always build the FST libray with -DFST_WRITER_PARALLEL, iff VL_THREADED. This supports run-time enablement of the FST writer thread, and has no measurable performance impact on single threaded tracing but simplifies the library build. Note: the actual choice of using the fst writer thread is still compile time, but can now be made run-time easily.	2022-07-19 13:48:03 +01:00
Geza Lore	db59c07f27	Implement trace offloading with fewer ifdefs Step towards a proper run-time library. Reduce the amount of ifdefs in the implementation of offloaded tracing. There are still a very small number of ifdefs left, which will need more careful changes in order to keep user API compatibility.	2022-07-19 11:31:35 +01:00
Geza Lore	b51f887567	Perform VCD tracing in parallel when using --threads (#3449 ) VCD tracing is now parallelized using the same thread pool as the model. We achieve this by breaking the top level trace functions into multiple top level functions (as many as --threads), and after emitting the time stamp to the VCD file on the main thread, we execute the tracing functions in parallel on the same thread pool as the model (which we pass to the trace file during registration), tracing into a secondary per thread buffer. The main thread will then stitch (memcpy) the buffers together into the output file. This makes the `--trace-threads` option redundant with `--trace`, which now only affects `--trace-fst`. FST tracing uses the previous offloading scheme. This obviously helps a lot in VCD tracing performance, and I have seen better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on OpenTitan 4T).	2022-05-29 19:08:39 +01:00
Geza Lore	a48c779367	Rename verilated_trace_imp.cpp -> verilated_trace_imp.h Also fix file header to describe purpose of this file.	2022-05-28 12:20:35 +01:00
Wilson Snyder	e02f97854c	Deprecate 'vluint64_t' and similar types (#3255 ).	2022-03-27 15:27:40 -04:00
Wilson Snyder	321880f5a6	Add trace dumpvars() call for selective runtime tracing (#3322 ).	2022-03-05 15:44:32 -05:00
Jamie Iles	b6ca2a42f2	Fix FST traces to include vector range (#3296 ) (#3297 )	2022-02-26 12:52:24 -05:00
Wilson Snyder	50094ca296	Internals: Add cpplint control file and related cleanups	2022-01-09 16:49:38 -05:00
Wilson Snyder	ca42be982c	Copyright year update.	2022-01-01 08:26:40 -05:00
Geza Lore	ff425369ac	Reduce .rodata footprint of trace initialization (#3250 ) Trace initialization (tracep->decl* functions) used to explicitly pass the complete hierarchical names of signals as string constants. This contains a lot of redundancy (path prefixes), does not scale well with large designs and resulted in .rodata sections (the string constants) in ELF executables being extremely large. This patch changes the API of trace initialization that allows pushing and popping name prefixes as we walk the hierarchy tree, which are prepended to declared signal names at run-time during trace initialization. This in turn allows us to emit repeat path/name components only once, effectively removing all duplicate path prefixes. On SweRV EH1 this reduces the .rodata section in a --trace build by 94%. Additionally, trace declarations are now emitted in lexical order by hierarchical signal names, and the top level trace initialization function respects --output-split-ctrace.	2021-12-19 15:15:07 +00:00
Pieter Kapsenberg	d1836b7b6f	Traces show array instances using brackets instead of parens (#3092 ) (#3095 )	2021-08-12 20:40:44 +03:00
Wilson Snyder	b8e804f05b	Internals: Some clang-tidy cleanups. No functional change intended.	2021-07-25 13:38:27 -04:00
Wilson Snyder	ab13a2ebdc	Internals: Use C++11 const and initializers. No functional change intended.	2021-07-24 08:36:11 -04:00
Wilson Snyder	52cde49a6f	Internals: Add more const. No functional change.	2021-06-18 22:24:08 -04:00
David Metz	f5ad5cf034	Fix dumping waveforms to multiple FST files (#2889 )	2021-04-14 16:52:14 -04:00
github action	52fc134272	Apply clang-format	2021-04-07 13:56:12 +00:00
Àlex Torregrosa	2b2680770b	Improve scope types in FST and VCD traces (#2805 ).	2021-04-07 09:55:11 -04:00
Wilson Snyder	a1ab295b74	Commentary: Cleanup all include/* header comments.	2021-03-20 17:46:00 -04:00
Wilson Snyder	2cad22a22a	Add simulation context (VerilatedContext) (#2660 ). (#2813 ) Add simulation context (VerilatedContext) to allow multiple fully independent models to be in the same process. Please see the updated examples. Add context->time() and context->timeInc() API calls, to set simulation time. These now are recommended in place of the legacy sc_time_stamp().	2021-03-07 11:01:54 -05:00
Wilson Snyder	caa9c99837	Commentary	2021-03-07 08:28:13 -05:00
Wilson Snyder	9650aefa42	Internals: Cleanup unneeded {}. No functional change	2021-02-21 21:25:21 -05:00
Wilson Snyder	bd602d0e2d	Copyright year update	2021-01-01 10:29:54 -05:00
Wilson Snyder	ee9d6dd63f	C++11: Favor auto, range for. No functional change intended.	2020-08-16 11:44:06 -04:00
Wilson Snyder	72d2cff0a1	C++11: Use member declaration initalizations. No functional change intended.	2020-08-16 11:44:06 -04:00
Wilson Snyder	c0127599df	C++11: Use nullptr. No functional change.	2020-08-16 11:44:05 -04:00
Geza Lore	378d947702	Travis: Add FreeBSD build + portability fixes	2020-06-28 15:37:24 +01:00
Wilson Snyder	6ce878cb0d	Fix some clang-tidy warnings	2020-06-01 23:16:17 -04:00
Geza Lore	95534fa5c5	Remove unused headers (#2389 )	2020-05-31 20:21:07 +01:00
Wilson Snyder	c4f31d3bb6	Tracing: Remove dead code. No functional change intended.	2020-05-17 09:52:03 -04:00
Wilson Snyder	29bcbb0417	Suppress impossible code coverage issues	2020-05-15 22:34:29 -04:00
Geza Lore	8afcd67a1f	Fix FST tracing of little endian vectors	2020-05-03 22:39:45 +01:00
Geza Lore	aa9cde22c8	Use SIMD intrinsics to render VCD traces (#2289 ) Use SIMD intrinsics to render VCD traces. I have measured 10-40% single threaded performance increase with VCD tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render the trace. Also helps a tiny bit with FST, but now almost all of the FST overhead is in the FST library. I have reworked the tracing routines to use more precisely sized arguments. The nice thing about this is that the performance without the intrinsics is pretty much the same as it was before, as we do at most 2x as much work as necessary, but in exchange there are no data dependent branches at all.	2020-04-30 00:09:09 +01:00
Geza Lore	b79ef672e1	Various minor optimizations of VCD trace routines - Change templated trace routines to branch table. Removed templating from trace chgBus and fullBus and replaced them with a branch table like the other there is a very small (< 1%) penalty for this on SwerRV EH1 CoreMark, but this is less than the variability of disk IO so it's worth it to keep the code simpler and smaller. - Prefetch VCD suffix buffer at the top of emit* - Increase ILP in VCD emit* routines - Use a 64-bit unaligned store to emit the VCD suffix (on x86 only) The performance difference with these is very small, but the changes hopefully make this code more performance-portable across various micro-architectures.	2020-04-27 18:44:53 +01:00
Geza Lore	c52f3349d1	Initial implementation of generic multithreaded tracing (#2269 ) The --trace-threads option can now be used to perform tracing on a thread separate from the main thread when using VCD tracing (with --trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and --trace-fst --trace-threads 1 is the same a what --trace-fst-threads used to be (which is now deprecated). Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @ 3.40GHz, IO to ramdisk, with numactl set to schedule threads on different physical cores. Relative speedup: --trace -> --trace --trace-threads 1 +22% --trace-fst -> --trace-fst --trace-threads 1 +38% (as --trace-fst-thread) --trace-fst -> --trace-fst --trace-threads 2 +93% Speed relative to --trace with no threaded tracing: --trace 1.00 x --trace --trace-threads 1 0.82 x --trace-fst 1.79 x --trace-fst --trace-threads 1 1.23 x --trace-fst --trace-threads 2 0.87 x This means FST tracing with 2 extra threads is now faster than single threaded VCD tracing, and is on par with threaded VCD tracing. You do pay for it in total compute though as --trace-fst --trace-threads 2 uses about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for --trace --trace threads 1. Still for interactive use it should be helpful with large designs.	2020-04-21 23:49:07 +01:00
Geza Lore	39d903375b	Factor out trace implementation common to all formats. (#2268 ) This patch de-duplicates common functionality between the VCD and FST trace implementation. It also enables adding new trace formats more easily and consistently. No functional nor performance change intended.	2020-04-19 23:57:36 +01:00
Geza Lore	6a54922044	Set FST timescale correctly. (#2266 ) The FST trace timescale used to be set in the constructor via set_time_unit, but at that point we haven't normally opened the file yet so it was just dropped. On top of that, we actually want to use set_time_resolution... FST trace timescales now match the VCD.	2020-04-19 08:47:22 -04:00
Geza Lore	74e16d85c5	Fix FST trace initial time stamp. (#2264 ) If the first dump was not at time zero, then the FST trace used to contain the initial values as if they were set at time zero. Now they only appear at the time the first dump call is actually made, and hence match the VCD trace exactly.	2020-04-18 18:54:02 -04:00
Wilson Snyder	d4f7f5297a	Support IEEE time units and time precisions, #234 . (#2253 ) Includes `timescale, $printtimescale, $timeformat. VL_TIME_MULTIPLIER, VL_TIME_PRECISION, VL_TIME_UNIT have been removed and the time precision must now match the SystemC time precision. To get closer behavior to older versions, use e.g. --timescale-override "1ps/1ps".	2020-04-15 19:39:03 -04:00
Geza Lore	dc5c259069	Improve tracing performance. (#2257 ) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ \| VCD \| FST \| FST separate thread \| \| (--trace) \| (--trace-fst) \| (--trace-fst-thread) \| ------------+-----------------------------------------------------+ Before \| 30.2 s \| 121.1 s \| 69.8 s \| ============+==============+===============+======================+ After \| 24.7 s \| 45.7 s \| 32.4 s \| ------------+--------------+---------------+----------------------+ Speedup \| 22 % \| 256 % \| 215 % \| ------------+--------------+---------------+----------------------+ Rel. to VCD \| 1 x \| 1.85 x \| 1.31 x \| ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.	2020-04-14 00:13:10 +01:00
Nathan Myers	4c1ae4701a	Add assertion for monotonic dump times #2103 (#2237 )	2020-04-09 19:00:27 -04:00
Geza Lore	991d8b178b	Fix FST tracing performance by removing std::map from hot path. (#2244 ) This patch eliminates a major piece of inefficiency in FST tracing support, by using an array to lookup fstHandle values corresponding to trace codes, instead of a tree based std::map. With this change, FST tracing is now only about 3x slower than VCD tracing. We do require more memory to store the symbol lookup table, but the size of that is still small, for the speed benefit.	2020-04-08 17:54:35 -04:00
Wilson Snyder	e07e9390f6	Internals: clang-format cleanups. No functional change.	2020-04-04 14:09:21 -04:00

1 2

75 Commits