verilator

Author	SHA1	Message	Date
Geza Lore	ddb678cc5b	Merge branch 'master' into develop-v5	2022-09-22 17:33:36 +01:00
Geza Lore	63c694f65f	Streamline dump control options - Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a special case of `--dumpi-<tag>` where tag can be a magic word, or a filename - Control dumping via static `dump*()` functions, analogous to `debug()` - Make dumping independent of the value of `debug()` (so dumping always works even without the debug flag) - Add separate `--dumpi-graph` for dumping V3Graphs, which is again a special case of `--dumpi-<tag>` - Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before	2022-09-22 17:24:41 +01:00
Geza Lore	95145038b4	Generate AstNode accessors via astgen Introduce the @astgen directives parsed by astgen, currently used for the generation child node (operand) accessors. Please see the updated internal documentation for details.	2022-09-21 14:05:27 +01:00
Geza Lore	ce03293128	Generate AstNode accessors via astgen Introduce the @astgen directives parsed by astgen, currently used for the generation child node (operand) accessors. Please see the updated internal documentation for details.	2022-09-21 13:56:03 +01:00
Wilson Snyder	a214fd1f78	Internals: Fix constructor syntax in new develop-v5 code	2022-09-17 08:56:41 -04:00
Geza Lore	af305bf280	Merge branch 'master' into develop-v5	2022-09-16 16:24:36 +01:00
Geza Lore	0c70a0dcbf	Remove redundant 'virtual' keywords from overridden methods 'virtual' is redundant when 'override' is present, so keep only 'override'. Add t/t_dist_cppstyle.pl to check for this.	2022-09-16 15:19:38 +01:00
Geza Lore	90ab746a42	Make it possible to parallelize ico and act scheduling sections Small fixup patch so the 'ico' and 'act' scheduling sections could be ordered as multi-threaded. However, we still only order these single threaded at the moment (but switching them to multi-threaded now works).	2022-09-06 16:01:13 +01:00
Geza Lore	298f71f2b1	Merge branch 'master' into develop-v5	2022-09-02 12:19:35 +01:00
Geza Lore	5c828b7e60	V3Partition: use V3Lists to keep track of SiblingMCs Replace std::set<SiblingMC> with V3Lists to keep track of SiblingMCs associated with MTasks, use a std::set<LogicMTask*> for ensuring uniqueness. This yields a bit more speed in PartContraction.	2022-09-01 19:40:44 +01:00
Geza Lore	4640bea31a	V3Partition: More improvements for PartFixDataHazards - Remove redundant loop through the MTask graph - Gather variables directly from the OrderGraph, which is simpler and faster.	2022-09-01 16:30:04 +01:00
Geza Lore	875361d7ce	V3Partition: Reduce working set size of PartContraction (#3587 ) This yields an additional 25% speedup of MT scheduling.	2022-09-01 16:29:40 +01:00
Geza Lore	c0f9b0d8f6	V3Partition: Refactor initialization of MTask dependencies No functional change	2022-08-31 16:54:04 +01:00
Geza Lore	505bba14eb	Improve PartFixDataHazards for clarity and speed. - Use modern C++ - Implement OrderLogicVertex->LogicMTask map with OrderLogicVertex::userp(), insteas of std::unordered_map - Simplify data structures - Simplify code and assert properties No functional change.	2022-08-31 16:52:05 +01:00
Geza Lore	ebbe24966c	Remove unnecessary virtual methods	2022-08-31 16:52:05 +01:00
Geza Lore	881c3f6e40	Minor optimization of PartContraction Remove rarely used debug code from initialization loop.	2022-08-31 16:52:05 +01:00
Geza Lore	5c356a4680	Merge branch 'master' into develop-v5	2022-08-22 14:32:06 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	ebb37b0156	Merge branch 'master' into develop-v5	2022-08-20 14:02:09 -04:00
Geza Lore	4d81eb021d	Revert "Improve performance of MTask coarsening" This reverts commit `83475008d9`.	2022-08-19 18:03:45 +01:00
Geza Lore	83475008d9	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-19 16:59:20 +01:00
Geza Lore	03ac7ad730	Make PartPropagateCp specific to the MTask graph While keeping the client code abstract in PartPropagateCp is nice for testing, there is performance to be had removing the abstraction. As this code dominates in scheduling large designs, we eliminate the abstraction and re-work the testing to use the actual LogicMTask and MTaskEdge graph types. No functional change intended.	2022-08-19 14:06:11 +01:00
Geza Lore	cd50949a7e	Reuse MTaskEdge instances in MT scheduling Instead of deleting then re-allocating MTaskEdge instances when merging two MTasks, just redirect the edged of the donor MTask to the recipient MTask. This is both faster as it avoids an allocation and a deletion, together with one update of the sibling maps, and also makes the algorithm more stable due to MergeCandidate IDs being stable and allocated up front for all MTaskEdges, before any SiblingMCs are allocated. Perturbations in output are expected as the IDs used to break ties between merge candidates with equal costs are not updated when redirecting an edge (on purpose). The relinking of only one end of the graph edges also perturbs the order in which they are enumerated, which does change candidate opportunities when the number of edges is larger than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when IDs are updated and edges are updated to appear in their original order.	2022-08-19 14:06:11 +01:00
Geza Lore	f0040c7b9a	Remove reliance on pointer comparison in MT scheduling The critical path propagation used to rely on a pointer comparison to break equal scoring critical path updates. Use the corresponding mtask ids instead, which is deterministic across invocations.	2022-08-19 14:06:11 +01:00
Geza Lore	f8a0389e73	Do not use stepCost when gathering sibling merge candidates siblingPairFromRelatives gathers neighbours of a vertex, and sorts them. It then takes the N best nodes, and creates sibling merge candidates from them. We now use the unadjusted cost instead of the step cost of the vertices when sorting. This is both faster as we need not do the log-space rounding to compute stepCost, and will also make similar but yet cheaper nodes appear closer to the front as we don't lose precision in rounding, hence they are more likely to be entered as merge candidates. Note that when creating the merge candidate, we still use the stepCost, so it's purpose of reducing the propagation of critical path updates is maintained in full. In summary, this should make both Verilator and the generated model very slightly faster, at least in theory, and I have observed minor improvement in places.	2022-08-19 14:06:11 +01:00
Geza Lore	c266739e9f	Merge branch 'master' into develop-v5	2022-08-05 12:17:57 +01:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	7403226a97	Merge branch 'master' into develop-v5	2022-08-04 10:03:38 +01:00
Geza Lore	fac8e76923	Rework SortByValueMap for better performance Keep a single std::set of key/value pairs, and a single unordered_map from key to iterators into the set. Also improve some of the accessing mechanisms using modern C++. This speeds up multi-threaded ordering by about 10%.	2022-08-03 21:17:02 +01:00
Geza Lore	b864f5f5ba	V3Partition: use static_cast with LogicMTaskVertex dynamic_cast is not free, and the mtask graph contains only LogicMTaskVertex vertices, use static_cast instead for some speedup.	2022-08-03 17:05:01 +01:00
Wilson Snyder	4859f5e1fa	Merge branch 'master' into develop-v5	2022-07-30 10:26:16 -04:00
Wilson Snyder	b9d7819faa	Internals: Fix some cppcheck issues. Some dump functions fixed.	2022-07-30 10:01:39 -04:00
Geza Lore	582da6df9a	Merge branch 'master' into develop-v5	2022-07-14 10:08:52 +01:00
Geza Lore	87f1e06c41	Small algorithmic improvement of PartContraction::siblingPairFromRelatives Use std::partial_sort for the non-exhaustive case. This is O(n) instead of O(nlog(n)) in the size of the candidate list being sorted. (It actually is O(nlog(k)), but k is constant 6 in the non-exhaustive case).	2022-07-12 19:10:01 +01:00
Geza Lore	7e8bafd217	Remove static data use from PartContraction::siblingPairFromRelatives Use std::sort with lambda rather than qsort with static function and static data. Verilation performance neutral.	2022-07-12 19:09:40 +01:00
Geza Lore	282887d9c6	Fix code coverage holes Fixes #3422	2022-05-16 21:22:21 +01:00
Geza Lore	599d23697d	IEEE compliant scheduler (#3384 ) This is a major re-design of the way code is scheduled in Verilator, with the goal of properly supporting the Active and NBA regions of the SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4. With this change, all internally generated clocks should simulate correctly, and there should be no more need for the `clock_enable` and `clocker` attributes for correctness in the absence of Verilator generated library models (`--lib-create`). Details of the new scheduling model and algorithm are provided in docs/internals.rst. Implements #3278	2022-05-15 16:03:32 +01:00
HungMingWu	880a9be3b1	Internal: Add C++20ish reverse_view for range loops. No functional change (#3388 ). Signed-off-by: HungMingWu <u9089000@gmail.com>	2022-04-18 13:03:56 -04:00
Geza Lore	fbd568dc47	Prep for multiple AstExecGraph. No functional change.	2022-04-10 12:00:17 +01:00
Wilson Snyder	e02f97854c	Deprecate 'vluint64_t' and similar types (#3255 ).	2022-03-27 15:27:40 -04:00
Geza Lore	b1b5b5dfe2	Improve run-time profiling The --prof-threads option has been split into two independent options: 1. --prof-exec, for collecting verilator_gantt and other execution related profiling data, and 2. --prof-pgo, for collecting data needed for PGO The implementation of execution profiling is extricated from VlThreadPool and is now a separate class VlExecutionProfiler. This means --prof-exec can now be used for single-threaded models (though it does not measure a lot of things just yet). For consistency VerilatedProfiler is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are in verilated_profiler.{h/cpp}, but can be used completely independently. Also re-worked the execution profile format so it now only emits events without holding onto any temporaries. This is in preparation for some future optimizations that would be hindered by the introduction of function locals via AstText. Also removed the Barrier event. Clearing the profile buffers is not notably more expensive as the profiling records are trivially destructible.	2022-03-27 15:57:30 +02:00
Wilson Snyder	e6857df5c6	Internals: Rename Ast on non-node classes (#3262 ). No functional change. This commit has the following replacements applied: s/\bAstUserInUseBase\b/VNUserInUseBase/g; s/\bAstAttrType\b/VAttrType/g; s/\bAstBasicDTypeKwd\b/VBasicDTypeKwd/g; s/\bAstDisplayType\b/VDisplayType/g; s/\bAstNDeleter\b/VNDeleter/g; s/\bAstNRelinker\b/VNRelinker/g; s/\bAstNVisitor\b/VNVisitor/g; s/\bAstPragmaType\b/VPragmaType/g; s/\bAstType\b/VNType/g; s/\bAstUser1InUse\b/VNUser1InUse/g; s/\bAstUser2InUse\b/VNUser2InUse/g; s/\bAstUser3InUse\b/VNUser3InUse/g; s/\bAstUser4InUse\b/VNUser4InUse/g; s/\bAstUser5InUse\b/VNUser5InUse/g; s/\bAstVarType\b/VVarType/g;	2022-01-02 14:03:20 -05:00
Wilson Snyder	24a0d2a0c9	Internals: Favor member assignment initialization. No functional change intended.	2022-01-01 11:46:49 -05:00
Wilson Snyder	ca42be982c	Copyright year update.	2022-01-01 08:26:40 -05:00
Wilson Snyder	cd737065f2	Internals: More const. No functional change intended.	2021-11-26 17:55:36 -05:00
Wilson Snyder	010084201a	Internals: Remove dead code.	2021-11-26 16:15:08 -05:00
Wilson Snyder	05e12ab60e	Internals: More const. No functional change intended.	2021-11-26 10:52:45 -05:00
Wilson Snyder	37e3c6da70	Internals: Add more const. No functional change intended.	2021-11-13 13:50:44 -05:00
Geza Lore	e69a8e838d	Improve memory usage of V3Partition. Only performance change intended. (#3192 )	2021-11-05 22:08:54 -04:00
Wilson Snyder	61612582e6	Improve memory usage of V3Partition. Only performance change intended.	2021-11-04 07:39:28 -04:00

1 2 3

111 Commits