Rely less on strings and represent AstNode classes as a 'class Node',
with all associated properties kept together, rather than distributed
over multiple dictionaries or constructed at retrieval time.
No functional change intended.
The recent patch to defer substitutions on V3Gate crashes on circular
logic that has cycle length >= 3 with all inlineable signals (cycle
length 2 is detected correctly and is not inlined). Fix by stopping
recursion at the loop-back edge.
Fixes#3543
This is detritus from when V3TraceDecl used to run after V3Gate, today
V3TraceDecl runs before V3Gate and this value has no function at all.
No functional change intended.
dynamic_cast is not free. Replace obvious instances (where the result is
unconditionally dereferenced) with static_cast in contexts with
performance implications.
Replace std::set<SiblingMC> with V3Lists to keep track of SiblingMCs
associated with MTasks, use a std::set<LogicMTask*> for ensuring
uniqueness. This yields a bit more speed in PartContraction.
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).
The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
queues and the scoreboard, instead of the old SortByValueMap. This
helps us avoid having to sort a lot of merge candidates that we will
never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
(the heap nodes in particular) directly in the object they relate to.
This eliminates a huge amount of lookups and helps a lot in
performance.
- Distribute storage for SiblingMC instances into the LogicMTask
instances, and combine with the sibling maps. This again eliminates
hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.
There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there
This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)
Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).
The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).
The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
queues and the scoreboard, instead of the old SortByValueMap. This
helps us avoid having to sort a lot of merge candidates that we will
never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
(the heap nodes in particular) directly in the object they relate to.
This eliminates a huge amount of lookups and helps a lot in
performance.
- Distribute storage for SiblingMC instances into the LogicMTask
instances, and combine with the sibling maps. This again eliminates
hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.
There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there
This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)
Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).
The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
While keeping the client code abstract in PartPropagateCp is nice for
testing, there is performance to be had removing the abstraction. As
this code dominates in scheduling large designs, we eliminate the
abstraction and re-work the testing to use the actual LogicMTask and
MTaskEdge graph types. No functional change intended.