Commit Graph

1808 Commits

Author SHA1 Message Date
Wilson Snyder
227e61f891 Fix comparing ranged slices of unpacked arrays. 2022-11-11 18:01:30 -05:00
Wilson Snyder
9d7c4d9af3 Fix wait 0. 2022-11-11 17:18:59 -05:00
Wilson Snyder
16586d1d37 Fix tracing parameters overridden with -G (#3723). 2022-11-10 20:30:10 -05:00
Wilson Snyder
e64295e92b Fix missing UNUSED warnings with --coverage (#3736). 2022-11-09 21:45:14 -05:00
Wilson Snyder
167f4ebbd4 Commentary: Changes update 2022-11-05 09:32:41 -04:00
Wilson Snyder
0b0b642241 devel release 2022-10-29 17:54:12 -04:00
Wilson Snyder
52d29d238c Version bump 2022-10-29 17:45:54 -04:00
Wilson Snyder
5c658f8cd5 Fix width mismatch on inside operator (#3714). 2022-10-28 06:38:49 -04:00
Wilson Snyder
5f9b0929b4 Commentary: Changes update 2022-10-27 21:35:09 -04:00
Wilson Snyder
4154584c4b Commentary: Changes update 2022-10-21 20:04:07 -04:00
Wilson Snyder
a57a3579c0 Fix false LATCH warning on 'unique if' (#3088). 2022-10-21 19:10:06 -04:00
Wilson Snyder
347e9b4ec8 Fix cell assigning integer array parameters (#3299). 2022-10-21 18:26:39 -04:00
Wilson Snyder
79682e6072 Support empty generate_regions (#3695). [mpb27] 2022-10-20 22:04:50 -04:00
Wilson Snyder
7e1b92fa75 Add --get-supported to determine what features are in Verilator (#3688). 2022-10-20 21:42:30 -04:00
Wilson Snyder
e7068369fe Fix $display of fixed-width numbers (#3565). 2022-10-18 21:10:35 -04:00
Wilson Snyder
b930d0731a Fix foreach and pre/post increment in functions (#3613). 2022-10-18 20:04:09 -04:00
Wilson Snyder
2723223884 Fix LSB error on --hierarchical submodules (#3539). 2022-10-18 17:29:51 -04:00
Wilson Snyder
cb7b024e8f Commentary: Spelling, and add upgrade notes (#3462) 2022-10-16 11:10:41 -04:00
Wilson Snyder
b16b607b98 Commentary: Changes update 2022-10-15 11:04:03 -04:00
Wilson Snyder
732d5bea10 Commentary: Standard format for company contributions 2022-10-15 10:59:31 -04:00
Wilson Snyder
d9a0d0ade2 Commentary (fix earlier commit) 2022-10-15 10:57:46 -04:00
Wilson Snyder
14f58ed6c7 Add error on real edge event control. 2022-10-15 06:21:34 -04:00
Geza Lore
b2070a9407 Commentary: Mention DFG in changes 2022-10-12 10:21:02 +01:00
Wilson Snyder
880cac2fdd Merge branch 'master' into develop-v5 2022-10-01 11:24:55 -04:00
Wilson Snyder
0b843ada03 devel release 2022-10-01 08:34:43 -04:00
Wilson Snyder
746c7ea8f7 Version bump 2022-10-01 08:28:27 -04:00
Wilson Snyder
fa4b10b4d9 Commentary: Changes update 2022-09-30 23:03:26 -04:00
Wilson Snyder
cd2a5771b8 Add --timing to --binary (#3625). 2022-09-28 19:02:23 -04:00
Wilson Snyder
b92173bf3d Add --binary option as alias of --main --exe --build (#3625). 2022-09-28 09:04:33 -04:00
Wilson Snyder
d162619bd3 Merge branch 'master' into develop-v5 2022-09-20 20:06:21 -04:00
Wilson Snyder
fc4ffd454e Rename --bin to --build-dep-bin. 2022-09-18 10:32:43 -04:00
Geza Lore
27031ed688 Merge branch 'master' into develop-v5 2022-09-15 10:28:35 +01:00
Wilson Snyder
75fd71d7e5 Add --main to generate main() C++ (previously was experimental only) (#3265). 2022-09-14 20:18:40 -04:00
Wilson Snyder
9efd64ab98 Commentary 2022-09-14 20:13:28 -04:00
Wilson Snyder
7aa01625d8 Commentary: Changes update 2022-09-14 08:15:42 -04:00
Wilson Snyder
81fe35ee2e Fix typedef'ed class conversion to boolean (#3616). 2022-09-12 18:03:56 -04:00
Geza Lore
fd6275a62b Merge branch 'master' into develop-v5 2022-09-05 17:03:43 +01:00
Geza Lore
d42a2d6494 Fix V3Gate crash on circular logic
The recent patch to defer substitutions on V3Gate crashes on circular
logic that has cycle length >= 3 with all inlineable signals (cycle
length 2 is detected correctly and is not inlined). Fix by stopping
recursion at the loop-back edge.

Fixes #3543
2022-09-02 19:58:58 +01:00
Wilson Snyder
849bb5590a Merge branch 'master' into develop-v5 2022-08-31 19:51:07 -04:00
Wilson Snyder
8d0c06e570 devel release 2022-08-31 19:49:24 -04:00
Wilson Snyder
5b2fbf4f37 Version bump 2022-08-31 19:46:45 -04:00
Wilson Snyder
592dab2bdb Commentary: Changes update 2022-08-31 19:27:43 -04:00
Wilson Snyder
51daa64e9a Fix --hierarchical with order-based pin connections (#3585). 2022-08-31 18:12:21 -04:00
Wilson Snyder
819e8741cc Merge branch 'master' into develop-v5 2022-08-30 00:20:21 -04:00
Wilson Snyder
c335aad25f Fix --hierarchical with order-based pin connections (#3583). 2022-08-29 22:49:19 -04:00
Wilson Snyder
2358ced061 Rename tracing rolloverSize and add test (#3570). 2022-08-28 08:25:02 -04:00
Geza Lore
5c356a4680 Merge branch 'master' into develop-v5 2022-08-22 14:32:06 +01:00
Krzysztof Bieganski
39af5d020e
Timing support (#3363)
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.

Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).

The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.

There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
  with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
  them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
  resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
  blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
  for suspending a stack of coroutines (normally, C++ coroutines are
  stackless).

There is a new visitor in `V3Timing.cpp` which:
  * scales delays according to the timescale,
  * simplifies intra-assignment timing controls and net delays into
    regular timing controls and assignments,
  * simplifies wait statements into loops with event controls,
  * marks processes and tasks with timing controls in them as
    suspendable,
  * creates delay, trigger scheduler, and fork sync variables,
  * transforms timing controls and fork joins into C++ awaits

There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.

There is also a function that transforms forked processes into separate
functions.

See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-08-22 13:26:32 +01:00
Geza Lore
9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Wilson Snyder
ebb37b0156 Merge branch 'master' into develop-v5 2022-08-20 14:02:09 -04:00