Commit Graph

5368 Commits

Author SHA1 Message Date
Wilson Snyder
5a1bcf9794 Tests: Add lint-py checker 2022-09-07 22:04:57 -04:00
Wilson Snyder
361cef4633 Fix pylint warnings. 2022-09-07 21:48:52 -04:00
Geza Lore
90ab746a42 Make it possible to parallelize ico and act scheduling sections
Small fixup patch so the 'ico' and 'act' scheduling sections could be
ordered as multi-threaded. However, we still only order these single
threaded at the moment (but switching them to multi-threaded now works).
2022-09-06 16:01:13 +01:00
github action
e94cdcf29c Apply 'make format' 2022-09-05 22:43:09 +00:00
Mladen Slijepcevic
1af046986d
Fix thread saftey in SystemC VL_ASSIGN_SBW/WSB (#3494) (#3513). 2022-09-05 18:42:12 -04:00
Wilson Snyder
1c9263a25b Commentary 2022-09-05 15:20:08 -04:00
Geza Lore
fd6275a62b Merge branch 'master' into develop-v5 2022-09-05 17:03:43 +01:00
Krzysztof Bieganski
6b6790fc50 Preserve return type of AstNode::addNext via templating (#3597)
No functional change intended.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 16:56:57 +01:00
Krzysztof Bieganski
fb931087ab
Add stats tracking for V3Undriven. (#3600)
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 16:20:38 +01:00
Krzysztof Bieganski
a2e1b32a1c
Fix inlining of forks (#3594)
Before this change, some forked processes were being inlined in
`V3Timing` because they contained no `CAwait`s. This only works under
the assumption that no `CAwait`s will be added there later, which is not
true, as a function called by a forked process could be turned into a
coroutine later. The call would be wrapped in a new `CAwait`, but the
process itself would have already been inlined at this point.

This commit moves the inlining to `transformForks` in `V3SchedTiming`,
which is called at a point when all `CAwait`s are already in place.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 15:19:19 +01:00
Krzysztof Bieganski
54f89bce42
Move SenExprBuilder to a header. (#3598)
No functional change intended.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 15:17:51 +01:00
Krzysztof Bieganski
8b19d02e3b
Fix co_await VlNow{} being added too many times (#3596)
(or not at all)

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 11:46:34 +01:00
Krzysztof Bieganski
da7ad35577
Fix fork debug output (#3593)
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-09-05 11:27:24 +01:00
Geza Lore
937e893b6d
Build verilator_bin with -O3 (#3592)
This is consistently a few percent faster.
2022-09-03 22:10:07 +01:00
Geza Lore
d42a2d6494 Fix V3Gate crash on circular logic
The recent patch to defer substitutions on V3Gate crashes on circular
logic that has cycle length >= 3 with all inlineable signals (cycle
length 2 is detected correctly and is not inlined). Fix by stopping
recursion at the loop-back edge.

Fixes #3543
2022-09-02 19:58:58 +01:00
Geza Lore
8e8f4b1e5c Remove AstVarScope::valuep() and related code
This is detritus from when V3TraceDecl used to run after V3Gate, today
V3TraceDecl runs before V3Gate and this value has no function at all.

No functional change intended.
2022-09-02 16:44:13 +01:00
Geza Lore
298f71f2b1 Merge branch 'master' into develop-v5 2022-09-02 12:19:35 +01:00
Geza Lore
2ba39b25f1 Replace dynamic_casts with static_casts
dynamic_cast is not free. Replace obvious instances (where the result is
unconditionally dereferenced) with static_cast in contexts with
performance implications.
2022-09-02 12:08:34 +01:00
Geza Lore
5c828b7e60 V3Partition: use V3Lists to keep track of SiblingMCs
Replace std::set<SiblingMC> with V3Lists to keep track of SiblingMCs
associated with MTasks, use a std::set<LogicMTask*> for ensuring
uniqueness. This yields a bit more speed in PartContraction.
2022-09-01 19:40:44 +01:00
Geza Lore
4640bea31a V3Partition: More improvements for PartFixDataHazards
- Remove redundant loop through the MTask graph
- Gather variables directly from the OrderGraph, which is simpler and
  faster.
2022-09-01 16:30:04 +01:00
Geza Lore
875361d7ce
V3Partition: Reduce working set size of PartContraction (#3587)
This yields an additional 25% speedup of MT scheduling.
2022-09-01 16:29:40 +01:00
Wilson Snyder
849bb5590a Merge branch 'master' into develop-v5 2022-08-31 19:51:07 -04:00
Wilson Snyder
8d0c06e570 devel release 2022-08-31 19:49:24 -04:00
Wilson Snyder
5b2fbf4f37 Version bump 2022-08-31 19:46:45 -04:00
Wilson Snyder
592dab2bdb Commentary: Changes update 2022-08-31 19:27:43 -04:00
Wilson Snyder
51daa64e9a Fix --hierarchical with order-based pin connections (#3585). 2022-08-31 18:12:21 -04:00
Geza Lore
c0f9b0d8f6 V3Partition: Refactor initialization of MTask dependencies
No functional change
2022-08-31 16:54:04 +01:00
Geza Lore
505bba14eb Improve PartFixDataHazards for clarity and speed.
- Use modern C++
- Implement OrderLogicVertex->LogicMTask map with
  OrderLogicVertex::userp(), insteas of std::unordered_map
- Simplify data structures
- Simplify code and assert properties

No functional change.
2022-08-31 16:52:05 +01:00
Geza Lore
ebbe24966c Remove unnecessary virtual methods 2022-08-31 16:52:05 +01:00
Geza Lore
881c3f6e40 Minor optimization of PartContraction
Remove rarely used debug code from initialization loop.
2022-08-31 16:52:05 +01:00
Geza Lore
546aeab9f2 V3Order: Minor refactoring for clarity
Refactor ProcessMoveBuildGraph utilizing the fact that OrderGraph is a
bipartite graph, also remove unnecessary unordered_map and distribute
variable domain map. No functional change.
2022-08-31 16:52:05 +01:00
Geza Lore
8de21e9bb7 Document and ensure OrderGraph is bipartite
Minor refactoring and documentation. No functional change.
2022-08-31 16:52:05 +01:00
Geza Lore
2ecda74471 Merge branch 'master' into develop-v5 2022-08-31 10:45:18 +01:00
Aleksander Kiryk
2136afde6b
Support negated properties (#3572) 2022-08-30 06:33:42 -04:00
Wilson Snyder
ea55db7286 Internals: Cleanup some string constructors. No functional change. 2022-08-30 01:02:39 -04:00
Wilson Snyder
819e8741cc Merge branch 'master' into develop-v5 2022-08-30 00:20:21 -04:00
Wilson Snyder
6a5f77b278 Internals: Cleanup some string/model constructors. No functional change. 2022-08-29 23:50:32 -04:00
Wilson Snyder
8658a0d7dc Internals: Constructor format update. No functional change. 2022-08-29 23:05:52 -04:00
Wilson Snyder
c335aad25f Fix --hierarchical with order-based pin connections (#3583). 2022-08-29 22:49:19 -04:00
Wilson Snyder
9d9d647c1f Fix indentation of --protect import function SV code. 2022-08-29 22:28:02 -04:00
Wilson Snyder
d47a37fb76 Internals: Cleanup constructors etc. No functional change. 2022-08-29 22:17:27 -04:00
Aleksander Kiryk
24ec84851a
Support $sampled (#3569) 2022-08-29 08:39:41 -04:00
Arkadiusz Kozdra
0a3a15a66e
Support class parameters (#2231) (#3541) 2022-08-28 10:24:55 -04:00
Wilson Snyder
2358ced061 Rename tracing rolloverSize and add test (#3570). 2022-08-28 08:25:02 -04:00
Krzysztof Bieganski
2af5304884
Fix tracing of slow coroutines (#3576 part) (#3579) 2022-08-26 05:11:44 -05:00
Varun Koyyalagunta
5869fdf7f6
Fix $dump systemtask with --output-split-cfuncs (#3495) (#3497) 2022-08-25 18:29:11 -05:00
Krzysztof Bieganski
1a1d2ecfd9
Enable tracing in generated main (#3578)
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-08-25 14:55:37 +01:00
Geza Lore
5c356a4680 Merge branch 'master' into develop-v5 2022-08-22 14:32:06 +01:00
Krzysztof Bieganski
39af5d020e
Timing support (#3363)
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.

Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).

The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.

There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
  with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
  them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
  resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
  blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
  for suspending a stack of coroutines (normally, C++ coroutines are
  stackless).

There is a new visitor in `V3Timing.cpp` which:
  * scales delays according to the timescale,
  * simplifies intra-assignment timing controls and net delays into
    regular timing controls and assignments,
  * simplifies wait statements into loops with event controls,
  * marks processes and tasks with timing controls in them as
    suspendable,
  * creates delay, trigger scheduler, and fork sync variables,
  * transforms timing controls and fork joins into C++ awaits

There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.

There is also a function that transforms forked processes into separate
functions.

See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-08-22 13:26:32 +01:00
Geza Lore
9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00