Commit Graph

111 Commits

Author SHA1 Message Date
Geza Lore
ddb678cc5b Merge branch 'master' into develop-v5 2022-09-22 17:33:36 +01:00
Geza Lore
63c694f65f Streamline dump control options
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
  special case of `--dumpi-<tag>` where tag can be a magic word, or a
  filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
  works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
  special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
2022-09-22 17:24:41 +01:00
Geza Lore
95145038b4 Generate AstNode accessors via astgen
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
2022-09-21 14:05:27 +01:00
Geza Lore
ce03293128 Generate AstNode accessors via astgen
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
2022-09-21 13:56:03 +01:00
Wilson Snyder
a214fd1f78 Internals: Fix constructor syntax in new develop-v5 code 2022-09-17 08:56:41 -04:00
Geza Lore
af305bf280 Merge branch 'master' into develop-v5 2022-09-16 16:24:36 +01:00
Geza Lore
0c70a0dcbf Remove redundant 'virtual' keywords from overridden methods
'virtual' is redundant when 'override' is present, so keep only
'override'.

Add t/t_dist_cppstyle.pl to check for this.
2022-09-16 15:19:38 +01:00
Geza Lore
90ab746a42 Make it possible to parallelize ico and act scheduling sections
Small fixup patch so the 'ico' and 'act' scheduling sections could be
ordered as multi-threaded. However, we still only order these single
threaded at the moment (but switching them to multi-threaded now works).
2022-09-06 16:01:13 +01:00
Geza Lore
298f71f2b1 Merge branch 'master' into develop-v5 2022-09-02 12:19:35 +01:00
Geza Lore
5c828b7e60 V3Partition: use V3Lists to keep track of SiblingMCs
Replace std::set<SiblingMC> with V3Lists to keep track of SiblingMCs
associated with MTasks, use a std::set<LogicMTask*> for ensuring
uniqueness. This yields a bit more speed in PartContraction.
2022-09-01 19:40:44 +01:00
Geza Lore
4640bea31a V3Partition: More improvements for PartFixDataHazards
- Remove redundant loop through the MTask graph
- Gather variables directly from the OrderGraph, which is simpler and
  faster.
2022-09-01 16:30:04 +01:00
Geza Lore
875361d7ce
V3Partition: Reduce working set size of PartContraction (#3587)
This yields an additional 25% speedup of MT scheduling.
2022-09-01 16:29:40 +01:00
Geza Lore
c0f9b0d8f6 V3Partition: Refactor initialization of MTask dependencies
No functional change
2022-08-31 16:54:04 +01:00
Geza Lore
505bba14eb Improve PartFixDataHazards for clarity and speed.
- Use modern C++
- Implement OrderLogicVertex->LogicMTask map with
  OrderLogicVertex::userp(), insteas of std::unordered_map
- Simplify data structures
- Simplify code and assert properties

No functional change.
2022-08-31 16:52:05 +01:00
Geza Lore
ebbe24966c Remove unnecessary virtual methods 2022-08-31 16:52:05 +01:00
Geza Lore
881c3f6e40 Minor optimization of PartContraction
Remove rarely used debug code from initialization loop.
2022-08-31 16:52:05 +01:00
Geza Lore
5c356a4680 Merge branch 'master' into develop-v5 2022-08-22 14:32:06 +01:00
Geza Lore
9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Wilson Snyder
ebb37b0156 Merge branch 'master' into develop-v5 2022-08-20 14:02:09 -04:00
Geza Lore
4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore
83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Geza Lore
03ac7ad730 Make PartPropagateCp specific to the MTask graph
While keeping the client code abstract in PartPropagateCp is nice for
testing, there is performance to be had removing the abstraction. As
this code dominates in scheduling large designs, we eliminate the
abstraction and re-work the testing to use the actual LogicMTask and
MTaskEdge graph types. No functional change intended.
2022-08-19 14:06:11 +01:00
Geza Lore
cd50949a7e Reuse MTaskEdge instances in MT scheduling
Instead of deleting then re-allocating MTaskEdge instances when merging
two MTasks, just redirect the edged of the donor MTask to the recipient
MTask. This is both faster as it avoids an allocation and a deletion,
together with one update of the sibling maps, and also makes the
algorithm more stable due to MergeCandidate IDs being stable and
allocated up front for all MTaskEdges, before any SiblingMCs are
allocated.

Perturbations in output are expected as the IDs used to break ties
between merge candidates with equal costs are not updated when
redirecting an edge (on purpose). The relinking of only one end of the
graph edges also perturbs the order in which they are enumerated, which
does change candidate opportunities when the number of edges is larger
than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when
IDs are updated and edges are updated to appear in their original order.
2022-08-19 14:06:11 +01:00
Geza Lore
f0040c7b9a Remove reliance on pointer comparison in MT scheduling
The critical path propagation used to rely on a pointer comparison to
break equal scoring critical path updates. Use the corresponding mtask
ids instead, which is deterministic across invocations.
2022-08-19 14:06:11 +01:00
Geza Lore
f8a0389e73 Do not use stepCost when gathering sibling merge candidates
siblingPairFromRelatives gathers neighbours of a vertex, and sorts them.
It then takes the N best nodes, and creates sibling merge candidates
from them. We now use the unadjusted cost instead of the step cost of
the vertices when sorting. This is both faster as we need not do the
log-space rounding to compute stepCost, and will also make similar but
yet cheaper nodes appear closer to the front as we don't lose precision
in rounding, hence they are more likely to be entered as merge
candidates. Note that when creating the merge candidate, we still use
the stepCost, so it's purpose of reducing the propagation of critical
path updates is maintained in full. In summary, this should make both
Verilator and the generated model very slightly faster, at least in
theory, and I have observed minor improvement in places.
2022-08-19 14:06:11 +01:00
Geza Lore
c266739e9f Merge branch 'master' into develop-v5 2022-08-05 12:17:57 +01:00
Geza Lore
96a4b3e5a5 Update clang-format config and apply
- Regroup and sort #include directives (like we used to, but automatic)
- Set AlwaysBreakTemplateDeclarations to true
2022-08-05 12:00:24 +01:00
Geza Lore
7403226a97 Merge branch 'master' into develop-v5 2022-08-04 10:03:38 +01:00
Geza Lore
fac8e76923 Rework SortByValueMap for better performance
Keep a single std::set of key/value pairs, and a single unordered_map
from key to iterators into the set. Also improve some of the accessing
mechanisms using modern C++. This speeds up multi-threaded ordering by
about 10%.
2022-08-03 21:17:02 +01:00
Geza Lore
b864f5f5ba V3Partition: use static_cast with LogicMTaskVertex
dynamic_cast is not free, and the mtask graph contains only
LogicMTaskVertex vertices, use static_cast instead for some speedup.
2022-08-03 17:05:01 +01:00
Wilson Snyder
4859f5e1fa Merge branch 'master' into develop-v5 2022-07-30 10:26:16 -04:00
Wilson Snyder
b9d7819faa Internals: Fix some cppcheck issues. Some dump functions fixed. 2022-07-30 10:01:39 -04:00
Geza Lore
582da6df9a Merge branch 'master' into develop-v5 2022-07-14 10:08:52 +01:00
Geza Lore
87f1e06c41 Small algorithmic improvement of PartContraction::siblingPairFromRelatives
Use std::partial_sort for the non-exhaustive case. This is O(n) instead
of O(n*log(n)) in the size of the candidate list being sorted. (It
actually is O(n*log(k)), but k is constant 6 in the non-exhaustive
case).
2022-07-12 19:10:01 +01:00
Geza Lore
7e8bafd217 Remove static data use from PartContraction::siblingPairFromRelatives
Use std::sort with lambda rather than qsort with static function and
static data. Verilation performance neutral.
2022-07-12 19:09:40 +01:00
Geza Lore
282887d9c6 Fix code coverage holes
Fixes #3422
2022-05-16 21:22:21 +01:00
Geza Lore
599d23697d
IEEE compliant scheduler (#3384)
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.

With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).

Details of the new scheduling model and algorithm are provided in
docs/internals.rst.

Implements #3278
2022-05-15 16:03:32 +01:00
HungMingWu
880a9be3b1
Internal: Add C++20ish reverse_view for range loops. No functional change (#3388).
Signed-off-by: HungMingWu <u9089000@gmail.com>
2022-04-18 13:03:56 -04:00
Geza Lore
fbd568dc47 Prep for multiple AstExecGraph. No functional change. 2022-04-10 12:00:17 +01:00
Wilson Snyder
e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Geza Lore
b1b5b5dfe2 Improve run-time profiling
The --prof-threads option has been split into two independent options:
1. --prof-exec, for collecting verilator_gantt and other execution
related profiling data, and
2. --prof-pgo, for collecting data needed for PGO

The implementation of execution profiling is extricated from
VlThreadPool and is now a separate class VlExecutionProfiler. This means
--prof-exec can now be used for single-threaded models (though it does
not measure a lot of things just yet). For consistency VerilatedProfiler
is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are
in verilated_profiler.{h/cpp}, but can be used completely independently.

Also re-worked the execution profile format so it now only emits events
without holding onto any temporaries. This is in preparation for some
future optimizations that would be hindered by the introduction of function
locals via AstText.

Also removed the Barrier event. Clearing the profile buffers is not
notably more expensive as the profiling records are trivially
destructible.
2022-03-27 15:57:30 +02:00
Wilson Snyder
e6857df5c6 Internals: Rename Ast on non-node classes (#3262). No functional change.
This commit has the following replacements applied:

	s/\bAstUserInUseBase\b/VNUserInUseBase/g;
        s/\bAstAttrType\b/VAttrType/g;
        s/\bAstBasicDTypeKwd\b/VBasicDTypeKwd/g;
        s/\bAstDisplayType\b/VDisplayType/g;
        s/\bAstNDeleter\b/VNDeleter/g;
        s/\bAstNRelinker\b/VNRelinker/g;
        s/\bAstNVisitor\b/VNVisitor/g;
        s/\bAstPragmaType\b/VPragmaType/g;
        s/\bAstType\b/VNType/g;
        s/\bAstUser1InUse\b/VNUser1InUse/g;
        s/\bAstUser2InUse\b/VNUser2InUse/g;
        s/\bAstUser3InUse\b/VNUser3InUse/g;
        s/\bAstUser4InUse\b/VNUser4InUse/g;
        s/\bAstUser5InUse\b/VNUser5InUse/g;
        s/\bAstVarType\b/VVarType/g;
2022-01-02 14:03:20 -05:00
Wilson Snyder
24a0d2a0c9 Internals: Favor member assignment initialization. No functional change intended. 2022-01-01 11:46:49 -05:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Wilson Snyder
cd737065f2 Internals: More const. No functional change intended. 2021-11-26 17:55:36 -05:00
Wilson Snyder
010084201a Internals: Remove dead code. 2021-11-26 16:15:08 -05:00
Wilson Snyder
05e12ab60e Internals: More const. No functional change intended. 2021-11-26 10:52:45 -05:00
Wilson Snyder
37e3c6da70 Internals: Add more const. No functional change intended. 2021-11-13 13:50:44 -05:00
Geza Lore
e69a8e838d
Improve memory usage of V3Partition. Only performance change intended. (#3192) 2021-11-05 22:08:54 -04:00
Wilson Snyder
61612582e6 Improve memory usage of V3Partition. Only performance change intended. 2021-11-04 07:39:28 -04:00