Commit Graph

80 Commits

Author SHA1 Message Date
Kamil Rakoczy
93d50c4499
Internals: Add mutex to V3Error (#3680) 2023-02-09 22:15:37 -05:00
Wilson Snyder
b24d7c83d3 Copyright year update 2023-01-01 10:18:39 -05:00
Wilson Snyder
a0e7930036 docs: Fix spelling 2022-12-09 22:39:41 -05:00
Wilson Snyder
833780fac1 Internal: cppcheck fixes. No functional change intended. 2022-11-27 05:52:40 -05:00
Wilson Snyder
0c75d4eaca Internals: Fix constructor style. 2022-11-10 22:58:27 -05:00
Geza Lore
050060b139 Make enum constructors and operators constexpr 2022-09-23 11:10:28 +01:00
Geza Lore
63c694f65f Streamline dump control options
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
  special case of `--dumpi-<tag>` where tag can be a magic word, or a
  filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
  works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
  special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
2022-09-22 17:24:41 +01:00
Geza Lore
38a8d7fb2e Remove redundant 'inline' keywords from definitions
Also add checks to t/t_dist_cppstyle
2022-09-16 15:52:25 +01:00
Geza Lore
9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Geza Lore
4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore
83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Geza Lore
cd50949a7e Reuse MTaskEdge instances in MT scheduling
Instead of deleting then re-allocating MTaskEdge instances when merging
two MTasks, just redirect the edged of the donor MTask to the recipient
MTask. This is both faster as it avoids an allocation and a deletion,
together with one update of the sibling maps, and also makes the
algorithm more stable due to MergeCandidate IDs being stable and
allocated up front for all MTaskEdges, before any SiblingMCs are
allocated.

Perturbations in output are expected as the IDs used to break ties
between merge candidates with equal costs are not updated when
redirecting an edge (on purpose). The relinking of only one end of the
graph edges also perturbs the order in which they are enumerated, which
does change candidate opportunities when the number of edges is larger
than PART_SIBLING_EDGE_LIMIT. Confirmed output is identical when
IDs are updated and edges are updated to appear in their original order.
2022-08-19 14:06:11 +01:00
Geza Lore
b436794773 Add specialized GraphStreamUnordered
GraphStreamUnordered used to be GraphStream<std::less<const
V3GraphVertex*>>, but a lot of performance improvements can be had by a
specialized implementation, so added a highly optimized one. This helps
a lot with --debug-partition.
2022-08-19 14:06:11 +01:00
Geza Lore
a2792785fe Add V3GraphVertex::dotRank to add GraphViz ranks to graph dumps
This is a simple debugging aid to allow constraining the graph layout
via GraphViz rank directives. Note this is not related in any way to the
vertex 'rank' attribute used by some of the graph algorithms.

No functional change.
2022-05-02 10:27:26 +01:00
Geza Lore
2ba9eb4228 Speed up TSP sort implementation
- More efficient comparison by pre-computing sorting keys.
- Remove work items in algorithms known to be redundant earlier.
  This greatly reduces data structure sizes.
- Use V3GraphVertex->user() for state tracking instead of unordered_map
  while both of these are constant time, they do add up.
- In `makeMinSpanningTree`, instead of batch inserting outgoing edges of
  each visited vertex into an ordered set, keep an ordered set of sorted
  vectors of edges. This reduces the size of the ordered set
  significantly (it is now O(V) rather than O(E), and as the subject
  graph is a complete graph, V ~ sqrt(E), so this is a significant gain).
- Use a vector + sorting in `perfectMatching` instead of an ordered set.
  This is faster on large working sets.

This yields 3.8x speedup on the variable order pass and overall 14%
verilation speed gain on a large design.
2022-01-07 12:05:52 +00:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Geza Lore
987ce927eb Remove unused code. No functional change. 2021-11-09 19:46:19 +00:00
Wilson Snyder
3a55600913 Internals: Restyle with C++11 using replacing typedef 2021-03-12 18:10:45 -05:00
Wilson Snyder
be31fdcfe4 Use Google-style-guide header guard naming, to avoid __ prefix. 2021-03-03 21:57:07 -05:00
Wilson Snyder
bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder
c23de458ed Misc internal coverage cleanups 2020-12-08 08:40:22 -05:00
Wilson Snyder
b6ded59c2b Internals: Use and enforce class final for ~5% performance boost. 2020-11-18 21:32:16 -05:00
Wilson Snyder
1b0a48ea02 Internals: Use C++11 = default where obvious. No functional change intended. 2020-11-16 19:56:16 -05:00
Wilson Snyder
b67f1f0e94 Fix GCC warnings 2020-08-18 08:10:44 -04:00
Wilson Snyder
78aee6f4e7 C++11: Use sized enums (+4% performance). 2020-08-16 12:05:35 -04:00
Wilson Snyder
034737d2a8 C++11: Use member declaration initalizations (in nodes). No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Wilson Snyder
5c966ec510 clang-format many files. No functional change.
Use nodist/clang_formatter to reformat files that are now clean.
2020-04-13 22:52:23 -04:00
Wilson Snyder
1ce360ed5b Add SPDX license identifiers. No functional change. 2020-03-21 11:24:24 -04:00
Wilson Snyder
0aabe6ce00 Internals: Fix cppcheck warning including missing init. 2020-02-03 22:10:29 -05:00
Wilson Snyder
73f5e3f808 Internals: Add missing const. No functional change. 2020-02-02 10:34:29 -05:00
Wilson Snyder
eafed88a6e Internals: Add assertions. No functional change intended. 2020-01-25 10:19:59 -05:00
Wilson Snyder
a4e8d39932 Spelling fixes 2020-01-24 20:10:44 -05:00
Wilson Snyder
f23fe8fd84 Update copyright year. 2020-01-06 18:05:53 -05:00
Julien Margetts
cafb148a62 Commentary 2019-12-17 18:27:47 -05:00
Wilson Snyder
5811ec07e6 Update URLs to https://verilator.org 2019-11-07 22:33:59 -05:00
Wilson Snyder
771a301f66 Commentary: Remove newlines, upsets some patches. No functional change. 2019-10-04 20:17:11 -04:00
Wilson Snyder
e1e4bde125 Remove old V3ClkGater code 2019-08-27 17:51:06 -04:00
Wilson Snyder
b83b606267 Internals: Detab and fix spacing style issues. No functional change.
When diff, recommend using "git diff --ignore-all-space"
When merging, recommend using "git merge -Xignore-all-space"
2019-05-19 16:13:13 -04:00
Wilson Snyder
8a4aeddbb0 Copyright year update. 2019-01-03 19:17:22 -05:00
Wilson Snyder
d87b9d25ca Internals: Cleanup and standardize include order. No functional change intended. 2018-10-14 13:59:40 -04:00
Wilson Snyder
e37dce9d85 Internals: Add new graph algs for future partitioning. 2018-07-15 22:09:27 -04:00
Wilson Snyder
43694ec87c Continued... Show file and line info when possible on internal graph errors. 2018-07-14 20:44:43 -04:00
Wilson Snyder
cf4bf9b7a5 Show file and line info when possible on internal graph errors. 2018-07-14 18:45:06 -04:00
Wilson Snyder
338ebcd6f0 Internal: Minor style cleanups for next merge. No functional change. 2018-07-14 17:27:05 -04:00
Wilson Snyder
84562f98de Internals: Add GraphWay methods for future graph algs. No functional change. 2018-07-08 22:01:16 -04:00
Wilson Snyder
39ecfd9900 Internals: rank() public for future optimizers. 2018-06-26 17:57:57 -04:00
Wilson Snyder
4c9c39bd08 Merge from master 2018-06-16 07:32:32 -04:00
Wilson Snyder
6e7f28785e Internals: Cleanup graph includes. No functional change. 2018-06-15 06:54:03 -04:00
Wilson Snyder
efb2801eeb Internals: Add orderPreRanked. No functional change. 2018-06-14 20:29:54 -04:00