verilator

Author	SHA1	Message	Date
Wilson Snyder	8d0c06e570	devel release	2022-08-31 19:49:24 -04:00
Wilson Snyder	5b2fbf4f37	Version bump	2022-08-31 19:46:45 -04:00
Wilson Snyder	592dab2bdb	Commentary: Changes update	2022-08-31 19:27:43 -04:00
Wilson Snyder	51daa64e9a	Fix --hierarchical with order-based pin connections (#3585 ).	2022-08-31 18:12:21 -04:00
Geza Lore	c0f9b0d8f6	V3Partition: Refactor initialization of MTask dependencies No functional change	2022-08-31 16:54:04 +01:00
Geza Lore	505bba14eb	Improve PartFixDataHazards for clarity and speed. - Use modern C++ - Implement OrderLogicVertex->LogicMTask map with OrderLogicVertex::userp(), insteas of std::unordered_map - Simplify data structures - Simplify code and assert properties No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	ebbe24966c	Remove unnecessary virtual methods	2022-08-31 16:52:05 +01:00
Geza Lore	881c3f6e40	Minor optimization of PartContraction Remove rarely used debug code from initialization loop.	2022-08-31 16:52:05 +01:00
Geza Lore	546aeab9f2	V3Order: Minor refactoring for clarity Refactor ProcessMoveBuildGraph utilizing the fact that OrderGraph is a bipartite graph, also remove unnecessary unordered_map and distribute variable domain map. No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	8de21e9bb7	Document and ensure OrderGraph is bipartite Minor refactoring and documentation. No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	2ecda74471	Merge branch 'master' into develop-v5	2022-08-31 10:45:18 +01:00
Aleksander Kiryk	2136afde6b	Support negated properties (#3572 )	2022-08-30 06:33:42 -04:00
Wilson Snyder	ea55db7286	Internals: Cleanup some string constructors. No functional change.	2022-08-30 01:02:39 -04:00
Wilson Snyder	819e8741cc	Merge branch 'master' into develop-v5	2022-08-30 00:20:21 -04:00
Wilson Snyder	6a5f77b278	Internals: Cleanup some string/model constructors. No functional change.	2022-08-29 23:50:32 -04:00
Wilson Snyder	8658a0d7dc	Internals: Constructor format update. No functional change.	2022-08-29 23:05:52 -04:00
Wilson Snyder	c335aad25f	Fix --hierarchical with order-based pin connections (#3583 ).	2022-08-29 22:49:19 -04:00
Wilson Snyder	9d9d647c1f	Fix indentation of --protect import function SV code.	2022-08-29 22:28:02 -04:00
Wilson Snyder	d47a37fb76	Internals: Cleanup constructors etc. No functional change.	2022-08-29 22:17:27 -04:00
Aleksander Kiryk	24ec84851a	Support $sampled (#3569 )	2022-08-29 08:39:41 -04:00
Arkadiusz Kozdra	0a3a15a66e	Support class parameters (#2231 ) (#3541 )	2022-08-28 10:24:55 -04:00
Wilson Snyder	2358ced061	Rename tracing rolloverSize and add test (#3570 ).	2022-08-28 08:25:02 -04:00
Krzysztof Bieganski	2af5304884	Fix tracing of slow coroutines (#3576 part) (#3579 )	2022-08-26 05:11:44 -05:00
Varun Koyyalagunta	5869fdf7f6	Fix $dump systemtask with --output-split-cfuncs (#3495 ) (#3497 )	2022-08-25 18:29:11 -05:00
Krzysztof Bieganski	1a1d2ecfd9	Enable tracing in generated main (#3578 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-25 14:55:37 +01:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Krzysztof Bieganski	39af5d020e	Timing support (#3363 ) Adds timing support to Verilator. It makes it possible to use delays, event controls within processes (not just at the start), wait statements, and forks. Building a design with those constructs requires a compiler that supports C++20 coroutines (GCC 10, Clang 5). The basic idea is to have processes and tasks with delays/event controls implemented as C++20 coroutines. This allows us to suspend and resume them at any time. There are five main runtime classes responsible for managing suspended coroutines: * `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle` with move semantics and automatic cleanup. * `VlDelayScheduler`, for coroutines suspended by delays. It resumes them at a proper simulation time. * `VlTriggerScheduler`, for coroutines suspended by event controls. It resumes them if its corresponding trigger was set. * `VlForkSync`, used for syncing `fork..join` and `fork..join_any` blocks. * `VlCoroutine`, the return type of all verilated coroutines. It allows for suspending a stack of coroutines (normally, C++ coroutines are stackless). There is a new visitor in `V3Timing.cpp` which: * scales delays according to the timescale, * simplifies intra-assignment timing controls and net delays into regular timing controls and assignments, * simplifies wait statements into loops with event controls, * marks processes and tasks with timing controls in them as suspendable, * creates delay, trigger scheduler, and fork sync variables, * transforms timing controls and fork joins into C++ awaits There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`) that integrate static scheduling with timing. This involves providing external domains for variables, so that the necessary combinational logic gets triggered after coroutine resumption, as well as statements that need to be injected into the design eval function to perform this resumption at the correct time. There is also a function that transforms forked processes into separate functions. See the comments in `verilated_timing.h`, `verilated_timing.cpp`, `V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals documentation for more details. Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-22 13:26:32 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	7cc89b8b42	Merge branch 'master' into develop-v5	2022-08-20 14:19:45 -04:00
Wilson Snyder	c6607724cb	Fix clang warning.	2022-08-20 14:19:00 -04:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Wilson Snyder	90dc04cf93	Add --future0 and --future1 options.	2022-08-20 14:01:13 -04:00
Krzysztof Bieganski	10cf492946	Add support for expressions in event controls (#3550 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 20:18:38 +02:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	03ac7ad730	Make PartPropagateCp specific to the MTask graph While keeping the client code abstract in PartPropagateCp is nice for testing, there is performance to be had removing the abstraction. As this code dominates in scheduling large designs, we eliminate the abstraction and re-work the testing to use the actual LogicMTask and MTaskEdge graph types. No functional change intended.	2022-08-19 14:06:11 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	f0040c7b9a	Remove reliance on pointer comparison in MT scheduling The critical path propagation used to rely on a pointer comparison to break equal scoring critical path updates. Use the corresponding mtask ids instead, which is deterministic across invocations.	2022-08-19 14:06:11 +01:00
Geza Lore	f8a0389e73	Do not use stepCost when gathering sibling merge candidates siblingPairFromRelatives gathers neighbours of a vertex, and sorts them. It then takes the N best nodes, and creates sibling merge candidates from them. We now use the unadjusted cost instead of the step cost of the vertices when sorting. This is both faster as we need not do the log-space rounding to compute stepCost, and will also make similar but yet cheaper nodes appear closer to the front as we don't lose precision in rounding, hence they are more likely to be entered as merge candidates. Note that when creating the merge candidate, we still use the stepCost, so it's purpose of reducing the propagation of critical path updates is maintained in full. In summary, this should make both Verilator and the generated model very slightly faster, at least in theory, and I have observed minor improvement in places.	2022-08-19 14:06:11 +01:00
Geza Lore	b436794773	Add specialized GraphStreamUnordered GraphStreamUnordered used to be GraphStream<std::less<const V3GraphVertex*>>, but a lot of performance improvements can be had by a specialized implementation, so added a highly optimized one. This helps a lot with --debug-partition.	2022-08-19 14:06:11 +01:00
Geza Lore	1404319b28	Merge branch 'master' into develop-v5	2022-08-19 13:39:44 +01:00
Geza Lore	90d22cbec6	Fix `AstNode::exists` return type	2022-08-19 13:22:06 +01:00
Krzysztof Bieganski	33e2acfe61	Fix `AstNode::forall` return type (#3559 ) Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>	2022-08-19 12:33:17 +01:00
Ryszard Rozak	db5fdfb0ee	Fix === with some tristate constants (#3551 ).	2022-08-18 07:03:05 -04:00
Krzysztof Bieganski	951cd73fe0	Handle MemberSel in V3EmitV.cpp (#3555 )	2022-08-18 06:33:45 -04:00
Arkadiusz Kozdra	0eeb40b975	Fix converting subclasses to string (#3552 )	2022-08-17 18:08:43 -04:00
Wilson Snyder	93272c13fd	Tests: Confirm fixed (#181 )	2022-08-15 22:17:36 -04:00
Wilson Snyder	43abaeb055	Tests: Confirm fixed (#485 )	2022-08-15 22:17:17 -04:00
Wilson Snyder	18b9e661c9	Tests: Confirm fixed (#446 )	2022-08-15 22:17:09 -04:00
Wilson Snyder	f435d96241	Fix case statement comparing string literal (#3544 ).	2022-08-15 21:56:09 -04:00

1 2 3 4 5 ...

5396 Commits