verilator

Author	SHA1	Message	Date
Wilson Snyder	d162619bd3	Merge branch 'master' into develop-v5	2022-09-20 20:06:21 -04:00
Wilson Snyder	fc4ffd454e	Rename --bin to --build-dep-bin.	2022-09-18 10:32:43 -04:00
Geza Lore	27031ed688	Merge branch 'master' into develop-v5	2022-09-15 10:28:35 +01:00
Wilson Snyder	75fd71d7e5	Add --main to generate main() C++ (previously was experimental only) (#3265 ).	2022-09-14 20:18:40 -04:00
Wilson Snyder	9efd64ab98	Commentary	2022-09-14 20:13:28 -04:00
Wilson Snyder	7aa01625d8	Commentary: Changes update	2022-09-14 08:15:42 -04:00
Wilson Snyder	81fe35ee2e	Fix typedef'ed class conversion to boolean (#3616 ).	2022-09-12 18:03:56 -04:00
Geza Lore	fd6275a62b	Merge branch 'master' into develop-v5	2022-09-05 17:03:43 +01:00
Geza Lore	d42a2d6494	Fix V3Gate crash on circular logic The recent patch to defer substitutions on V3Gate crashes on circular logic that has cycle length >= 3 with all inlineable signals (cycle length 2 is detected correctly and is not inlined). Fix by stopping recursion at the loop-back edge. Fixes #3543	2022-09-02 19:58:58 +01:00
Wilson Snyder	849bb5590a	Merge branch 'master' into develop-v5	2022-08-31 19:51:07 -04:00
Wilson Snyder	8d0c06e570	devel release	2022-08-31 19:49:24 -04:00
Wilson Snyder	5b2fbf4f37	Version bump	2022-08-31 19:46:45 -04:00
Wilson Snyder	592dab2bdb	Commentary: Changes update	2022-08-31 19:27:43 -04:00
Wilson Snyder	51daa64e9a	Fix --hierarchical with order-based pin connections (#3585 ).	2022-08-31 18:12:21 -04:00
Wilson Snyder	819e8741cc	Merge branch 'master' into develop-v5	2022-08-30 00:20:21 -04:00
Wilson Snyder	c335aad25f	Fix --hierarchical with order-based pin connections (#3583 ).	2022-08-29 22:49:19 -04:00
Wilson Snyder	2358ced061	Rename tracing rolloverSize and add test (#3570 ).	2022-08-28 08:25:02 -04:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Krzysztof Bieganski	39af5d020e	Timing support (#3363 ) Adds timing support to Verilator. It makes it possible to use delays, event controls within processes (not just at the start), wait statements, and forks. Building a design with those constructs requires a compiler that supports C++20 coroutines (GCC 10, Clang 5). The basic idea is to have processes and tasks with delays/event controls implemented as C++20 coroutines. This allows us to suspend and resume them at any time. There are five main runtime classes responsible for managing suspended coroutines: * `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle` with move semantics and automatic cleanup. * `VlDelayScheduler`, for coroutines suspended by delays. It resumes them at a proper simulation time. * `VlTriggerScheduler`, for coroutines suspended by event controls. It resumes them if its corresponding trigger was set. * `VlForkSync`, used for syncing `fork..join` and `fork..join_any` blocks. * `VlCoroutine`, the return type of all verilated coroutines. It allows for suspending a stack of coroutines (normally, C++ coroutines are stackless). There is a new visitor in `V3Timing.cpp` which: * scales delays according to the timescale, * simplifies intra-assignment timing controls and net delays into regular timing controls and assignments, * simplifies wait statements into loops with event controls, * marks processes and tasks with timing controls in them as suspendable, * creates delay, trigger scheduler, and fork sync variables, * transforms timing controls and fork joins into C++ awaits There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`) that integrate static scheduling with timing. This involves providing external domains for variables, so that the necessary combinational logic gets triggered after coroutine resumption, as well as statements that need to be injected into the design eval function to perform this resumption at the correct time. There is also a function that transforms forked processes into separate functions. See the comments in `verilated_timing.h`, `verilated_timing.cpp`, `V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals documentation for more details. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-22 13:26:32 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Wilson Snyder	90dc04cf93	Add --future0 and --future1 options.	2022-08-20 14:01:13 -04:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	1404319b28	Merge branch 'master' into develop-v5	2022-08-19 13:39:44 +01:00
Wilson Snyder	f435d96241	Fix case statement comparing string literal (#3544 ).	2022-08-15 21:56:09 -04:00
Wilson Snyder	cbe1b8e266	Fix segfault exporting non-existant package (#3535 ).	2022-08-08 17:53:50 -04:00
Geza Lore	ad2fbfe62d	Merge branch 'master' into develop-v5	2022-07-29 12:04:24 +01:00
Yutetsu TAKATSUKASA	1f9323d086	Set correct dtype in replaceShiftSame() (#3520 ) * Tests: Add a test to reproduce bug3399 * Fix3399. Set the correct dtype in replaceShiftSame(). * Tests: update stats. * Update Changes	2022-07-29 07:05:04 +09:00
Yutetsu TAKATSUKASA	60eab3eb8c	Fix wrong result of bit op tree optimization #3509 (#3516 ) * Tests: Add a test to reproduce #3509 * Tests: Compile without tautological-compare check because bit op tree optimization is disabled in the test. * Internals: Dedup code. No functional change is intended. * Fix #3509. "2'b10 == (2'b11 & {1'b0, val[0]})" and "2'b10 != (2'b11 & {1'b0, val[0]})" were wrongly optimized to "!val[0]" and "val[0]" respectively. Now properly optimize them to 1'b0 and 1'b1. * Commentary * Commentary: Update Changes	2022-07-24 19:54:37 +09:00
Geza Lore	c9ac9a75a6	Merge branch 'master' into develop-v5	2022-07-12 17:29:45 +01:00
Wilson Snyder	5f3316d3dc	* Fix empty string arguments to display (#3484 ).	2022-07-09 08:30:57 -04:00
Wilson Snyder	a4fddb3fbe	Fix table misoptimizing away display (#3488 ).	2022-07-09 07:55:46 -04:00
Yutetsu TAKATSUKASA	9f37cef1bb	Fix #3470 of incorrect bit op tree optimization (#3476 ) * Tests: Add a test to reproduce #3470 * Update LSB during return path of traversal. No functional change is intended. * Introduce LeafInfo::m_msb * Update LeafInfo::m_msb when visitin AstCCast * Internals: Add comment, reorder. No functional change is intended. * Delete explicit from copy constructor to fix build error. * Update Changes * Internals: Remove unused parameter. No functional change is intended. * Tests: Add explanation to t_const_opt.	2022-07-06 08:33:37 +09:00
Wilson Snyder	e7ca4a69e3	Merge branch 'master' into develop-v5	2022-06-19 15:22:09 -04:00
Wilson Snyder	f2fba51fe2	devel release	2022-06-19 15:13:29 -04:00
Wilson Snyder	7c79f0d431	Version bump (Changes update)	2022-06-19 15:10:23 -04:00
Wilson Snyder	1fa82ffb7b	Commentary: Update ChangeLog	2022-06-19 15:00:10 -04:00
Wilson Snyder	e7dc2de14b	Fix BLKANDNBLK on $readmem/$writemem (#3379 ).	2022-06-04 12:43:18 -04:00
Wilson Snyder	0f324c8309	Merge branch 'master' into develop-v5	2022-06-04 11:59:49 -04:00
Wilson Snyder	59dc2853e3	Support concat assignment to packed array (#3446 ).	2022-06-03 21:32:13 -04:00
Wilson Snyder	ada58465b2	Add -f<optimization> options to replace -O<letter> options (#3436 ).	2022-06-03 20:43:16 -04:00
Wilson Snyder	173f57c636	Changed --no-merge-const-pool to -fno-merge-const-pool (#3436 ).	2022-06-03 19:41:59 -04:00
Geza Lore	b51f887567	Perform VCD tracing in parallel when using --threads (#3449 ) VCD tracing is now parallelized using the same thread pool as the model. We achieve this by breaking the top level trace functions into multiple top level functions (as many as --threads), and after emitting the time stamp to the VCD file on the main thread, we execute the tracing functions in parallel on the same thread pool as the model (which we pass to the trace file during registration), tracing into a secondary per thread buffer. The main thread will then stitch (memcpy) the buffers together into the output file. This makes the `--trace-threads` option redundant with `--trace`, which now only affects `--trace-fst`. FST tracing uses the previous offloading scheme. This obviously helps a lot in VCD tracing performance, and I have seen better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on OpenTitan 4T).	2022-05-29 19:08:39 +01:00
Geza Lore	0722f47539	Improve V3MergeCond by reordering statements (#3125 ) V3MergeCond merges consecutive conditional `_ = cond ? _ : _` and `if (cond) ...` statements. This patch adds an analysis and ordering phase that moves statements with identical conditions closer to each other, in order to enable more merging opportunities. This in turn eliminates a lot of repeated conditionals which reduced dynamic branch count and branch misprediction rate. Observed 6.5% improvement on multi-threaded large designs, at the cost of less than 2% increase in Verilation speed.	2022-05-27 16:57:51 +01:00
Krzysztof Bieganski	d7a75dc026	Merge branch 'master' into develop-v5	2022-05-25 11:06:38 +02:00
Wilson Snyder	530817191e	Support non-ANSI interface port declarations (#3439 ).	2022-05-25 00:50:50 -04:00
Geza Lore	c7610ed044	Fix FST tracing thread in CMake build	2022-05-20 17:04:46 +01:00
Geza Lore	ffc1c51526	Commentary	2022-05-20 16:44:53 +01:00
Geza Lore	b130a8cfeb	Add -DVM_TRACE_VCD in model builds with Make with --trace	2022-05-20 16:44:38 +01:00

1 2 3 4 5 ...

1779 Commits