verilator

Author	SHA1	Message	Date
Wilson Snyder	819e8741cc	Merge branch 'master' into develop-v5	2022-08-30 00:20:21 -04:00
Wilson Snyder	6a5f77b278	Internals: Cleanup some string/model constructors. No functional change.	2022-08-29 23:50:32 -04:00
Wilson Snyder	2358ced061	Rename tracing rolloverSize and add test (#3570 ).	2022-08-28 08:25:02 -04:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Krzysztof Bieganski	39af5d020e	Timing support (#3363 ) Adds timing support to Verilator. It makes it possible to use delays, event controls within processes (not just at the start), wait statements, and forks. Building a design with those constructs requires a compiler that supports C++20 coroutines (GCC 10, Clang 5). The basic idea is to have processes and tasks with delays/event controls implemented as C++20 coroutines. This allows us to suspend and resume them at any time. There are five main runtime classes responsible for managing suspended coroutines: * `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle` with move semantics and automatic cleanup. * `VlDelayScheduler`, for coroutines suspended by delays. It resumes them at a proper simulation time. * `VlTriggerScheduler`, for coroutines suspended by event controls. It resumes them if its corresponding trigger was set. * `VlForkSync`, used for syncing `fork..join` and `fork..join_any` blocks. * `VlCoroutine`, the return type of all verilated coroutines. It allows for suspending a stack of coroutines (normally, C++ coroutines are stackless). There is a new visitor in `V3Timing.cpp` which: * scales delays according to the timescale, * simplifies intra-assignment timing controls and net delays into regular timing controls and assignments, * simplifies wait statements into loops with event controls, * marks processes and tasks with timing controls in them as suspendable, * creates delay, trigger scheduler, and fork sync variables, * transforms timing controls and fork joins into C++ awaits There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`) that integrate static scheduling with timing. This involves providing external domains for variables, so that the necessary combinational logic gets triggered after coroutine resumption, as well as statements that need to be injected into the design eval function to perform this resumption at the correct time. There is also a function that transforms forked processes into separate functions. See the comments in `verilated_timing.h`, `verilated_timing.cpp`, `V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals documentation for more details. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-22 13:26:32 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Geza Lore	1404319b28	Merge branch 'master' into develop-v5	2022-08-19 13:39:44 +01:00
Wilson Snyder	1e2219347e	Internals: Cleanup ifdef, move up not under compilver version ifdef	2022-08-11 17:41:43 -04:00
Geza Lore	a4fd6d38fb	Add operator != to VlWide This is required by VlUnpacked::neq	2022-08-07 13:13:28 +01:00
Geza Lore	c266739e9f	Merge branch 'master' into develop-v5	2022-08-05 12:17:57 +01:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	39d1a62f9e	Fix change detection on unpacked arrays Expand array assignment when creating the trigger, as V3Expand might mangle it otherwise.	2022-08-02 13:01:41 +01:00
Wilson Snyder	3c54d5df70	Merge branch 'master' into develop-v5	2022-07-30 14:42:51 -04:00
Wilson Snyder	f91793e931	Revert - SC overrides cause non-override clang error.	2022-07-30 13:53:54 -04:00
Wilson Snyder	daac7cb90d	Merge branch 'master' into develop-v5	2022-07-30 12:09:05 -04:00
Wilson Snyder	a2d26b45bb	Internals: Fix some clang-tidy issues. No functional change intended.	2022-07-30 11:54:28 -04:00
Wilson Snyder	dce8f3d25d	Internals: Spacing from develop-v5. No functional change.	2022-07-30 11:54:28 -04:00
Geza Lore	38e5b6c1ad	Replace __gcov_flush with __gcov_dump __gcov_flush was a private function and was removed from later GCC versions (at least from 11.2.0, possibly earlier). Replace with the documented public __gcov_dump.	2022-07-30 16:02:03 +01:00
Wilson Snyder	4859f5e1fa	Merge branch 'master' into develop-v5	2022-07-30 10:26:16 -04:00
Wilson Snyder	b9d7819faa	Internals: Fix some cppcheck issues. Some dump functions fixed.	2022-07-30 10:01:39 -04:00
Geza Lore	ad2fbfe62d	Merge branch 'master' into develop-v5	2022-07-29 12:04:24 +01:00
Gustav Svensk	eeef5ab4de	Fix sformat string incorrectly cleared (#3515 ) (#3519 ).	2022-07-25 17:36:34 +02:00
Geza Lore	386401da60	Merge branch 'master' into develop-v5	2022-07-22 15:09:20 +01:00
Geza Lore	e0b61ceabd	Remove legacy #ifdef SYSTEMC_64BIT_PATCHES These days this is always false, see #3505	2022-07-21 15:01:17 +01:00
Geza Lore	f9ecbdc70b	Merge branch 'master' into develop-v5	2022-07-21 09:56:14 +01:00
Geza Lore	30e3edb81d	Remove deprecated and unused timescale override defines These have been 'deprecated' for 2 years and are otherwise unused except for using a temporary placeholder value, which I have inlined with the default value. Also remove the now VL_TIME_STR_CONVERT utility function (and corresponding unit tests), which have no references in any project on GitHub.	2022-07-20 14:06:09 +01:00
Geza Lore	1d400dd98c	Configure tracing at run-time, instead of compile time (#3504 ) All remaining use of conditional compilation in the tracing implementation of the run-time library are replaced with the use of VerilatedModel::traceConfig, and is now done at run-time.	2022-07-20 11:27:10 +01:00
Geza Lore	a4ed3c2086	Make parallel tracing switchable at run-time	2022-07-19 17:13:13 +01:00
Geza Lore	efb5caad22	Improve robustness of trace configuration Always fail if adding a model to a trace file that has already executed a dump. We used to do this before as well, though in a less robust way. We will be relying on this property more in the future, so improve the check.	2022-07-19 14:16:08 +01:00
Geza Lore	3a002b6cf2	Remove VerilatedVcd::m_evcd and related dead code. The legacy code that was using this was removed earlier, and m_evcd was constant false, so removed.	2022-07-19 13:58:18 +01:00
Geza Lore	f8b7981be4	Make use of FST writer thread switchable at run-time. Always build the FST libray with -DFST_WRITER_PARALLEL, iff VL_THREADED. This supports run-time enablement of the FST writer thread, and has no measurable performance impact on single threaded tracing but simplifies the library build. Note: the actual choice of using the fst writer thread is still compile time, but can now be made run-time easily.	2022-07-19 13:48:03 +01:00
Geza Lore	b55ee79d86	Fix typo	2022-07-19 12:36:21 +01:00
Geza Lore	db59c07f27	Implement trace offloading with fewer ifdefs Step towards a proper run-time library. Reduce the amount of ifdefs in the implementation of offloaded tracing. There are still a very small number of ifdefs left, which will need more careful changes in order to keep user API compatibility.	2022-07-19 11:31:35 +01:00
Geza Lore	9085e34d70	Pass VerilatedModel at trace registration time	2022-07-19 11:00:09 +01:00
Geza Lore	c28bf9ce24	Fix change detection over unpacked arrays.	2022-07-18 12:25:22 +01:00
Geza Lore	c9ac9a75a6	Merge branch 'master' into develop-v5	2022-07-12 17:29:45 +01:00
Geza Lore	79c901c220	Tighten signatures/implementaion of VerilatedModel abstract methods.	2022-07-12 16:06:08 +01:00
Geza Lore	b61d819fcb	Move contextp() under VerilatedModel	2022-07-12 16:06:08 +01:00
Geza Lore	f4038e3674	Move thread pool and execution profiler into the context. (#3477 ) Fixes #3454	2022-07-12 11:41:15 +01:00
Arkadiusz Kozdra	8377514127	Add support for $test$plusargs(expr) (#3489 )	2022-07-11 06:21:35 -04:00
Geza Lore	0de1bbc85b	Add and use VL_CONSTEXPR_CXX17	2022-07-05 14:21:28 +01:00
Geza Lore	42b711b862	Don't use 'assert' in profiler initialization	2022-07-05 12:18:54 +01:00
Wilson Snyder	b25b798dbe	Merge branch 'master' into develop-v5	2022-07-04 13:20:03 -04:00
Geza Lore	1bb6433649	Improve worker thread shutdown. Always ensure worker thread task queue is drained before shutting down.	2022-06-27 15:03:36 +01:00
Wilson Snyder	fc4d6a62af	Remove VL_PROFILER ifdef. Partial (#3454 ).	2022-06-22 20:06:23 -04:00
Wilson Snyder	49455721a3	Commentary	2022-06-21 19:28:23 -04:00
Wilson Snyder	0f324c8309	Merge branch 'master' into develop-v5	2022-06-04 11:59:49 -04:00
Geza Lore	b51f887567	Perform VCD tracing in parallel when using --threads (#3449 ) VCD tracing is now parallelized using the same thread pool as the model. We achieve this by breaking the top level trace functions into multiple top level functions (as many as --threads), and after emitting the time stamp to the VCD file on the main thread, we execute the tracing functions in parallel on the same thread pool as the model (which we pass to the trace file during registration), tracing into a secondary per thread buffer. The main thread will then stitch (memcpy) the buffers together into the output file. This makes the `--trace-threads` option redundant with `--trace`, which now only affects `--trace-fst`. FST tracing uses the previous offloading scheme. This obviously helps a lot in VCD tracing performance, and I have seen better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on OpenTitan 4T).	2022-05-29 19:08:39 +01:00
Geza Lore	c4b8675d77	Always inline some small, hot trace routines	2022-05-28 12:47:09 +01:00
Geza Lore	a7cd7a1ed9	Initialize VerilatedTrace members in class	2022-05-28 12:47:07 +01:00

1 2 3 4 5 ...

856 Commits