Commit Graph

1757 Commits

Author SHA1 Message Date
Wilson Snyder
fc4ffd454e Rename --bin to --build-dep-bin. 2022-09-18 10:32:43 -04:00
Wilson Snyder
75fd71d7e5 Add --main to generate main() C++ (previously was experimental only) (#3265). 2022-09-14 20:18:40 -04:00
Wilson Snyder
9efd64ab98 Commentary 2022-09-14 20:13:28 -04:00
Wilson Snyder
7aa01625d8 Commentary: Changes update 2022-09-14 08:15:42 -04:00
Wilson Snyder
81fe35ee2e Fix typedef'ed class conversion to boolean (#3616). 2022-09-12 18:03:56 -04:00
Geza Lore
d42a2d6494 Fix V3Gate crash on circular logic
The recent patch to defer substitutions on V3Gate crashes on circular
logic that has cycle length >= 3 with all inlineable signals (cycle
length 2 is detected correctly and is not inlined). Fix by stopping
recursion at the loop-back edge.

Fixes #3543
2022-09-02 19:58:58 +01:00
Wilson Snyder
8d0c06e570 devel release 2022-08-31 19:49:24 -04:00
Wilson Snyder
5b2fbf4f37 Version bump 2022-08-31 19:46:45 -04:00
Wilson Snyder
592dab2bdb Commentary: Changes update 2022-08-31 19:27:43 -04:00
Wilson Snyder
51daa64e9a Fix --hierarchical with order-based pin connections (#3585). 2022-08-31 18:12:21 -04:00
Wilson Snyder
c335aad25f Fix --hierarchical with order-based pin connections (#3583). 2022-08-29 22:49:19 -04:00
Wilson Snyder
2358ced061 Rename tracing rolloverSize and add test (#3570). 2022-08-28 08:25:02 -04:00
Geza Lore
9ac64d0b92 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-20 21:18:50 +01:00
Wilson Snyder
90dc04cf93 Add --future0 and --future1 options. 2022-08-20 14:01:13 -04:00
Geza Lore
4d81eb021d Revert "Improve performance of MTask coarsening"
This reverts commit 83475008d9.
2022-08-19 18:03:45 +01:00
Geza Lore
83475008d9 Improve performance of MTask coarsening
Various optimizations to speed up MTasks coarsening (which is the long
pole in the multi-threaded scheduling of very large designs).

The biggest impact ones:
- Use efficient hand written Pairing Heaps for implementing priority
  queues and the scoreboard, instead of the old SortByValueMap. This
  helps us avoid having to sort a lot of merge candidates that we will
  never actually consider and helps a lot in performance.
- Remove unnecessary associative containers and store data structures
  (the heap nodes in particular) directly in the object they relate to.
  This eliminates a huge amount of lookups and helps a lot in
  performance.
- Distribute storage for SiblingMC instances into the LogicMTask
  instances, and combine with the sibling maps. This again eliminates
  hash table lookups and makes storage structures smaller.
- Remove some now bidirectional edge maps, keep only the forward map.

There are also some other smaller optimizations:
- Replaced more unnecessary dynamic_casts with static_casts
- Templated some functions/classes to reduce the number of static
  branches in loops.
- Improves sorting of edges for sibling candidate creation
- Various micro-optimizations here and there

This speeds up MTask coarsening by 3.8x on a large design, which
translates to a 2.5x speedup of the ordering pass in multi-threaded
mode. (Combined with the earlier optimizations, ordering is now 3x
faster.)

Due to the elimination of a lot of the auxiliary data structures, and
ensuring a minimal size for the necessary ones, memory consumption of
the MTask coarsening is also reduced (measured up to 4.4x reduction
though the accuracy of this is low).

The algorithm is identical except for minor alterations of the order
some candidates are added or removed, this can cause perturbation in the
output due to tied scores being broken based on IDs.
2022-08-19 16:59:20 +01:00
Wilson Snyder
f435d96241 Fix case statement comparing string literal (#3544). 2022-08-15 21:56:09 -04:00
Wilson Snyder
cbe1b8e266 Fix segfault exporting non-existant package (#3535). 2022-08-08 17:53:50 -04:00
Yutetsu TAKATSUKASA
1f9323d086
Set correct dtype in replaceShiftSame() (#3520)
* Tests: Add a test to reproduce bug3399

* Fix3399. Set the correct dtype in replaceShiftSame().

* Tests: update stats.

* Update Changes
2022-07-29 07:05:04 +09:00
Yutetsu TAKATSUKASA
60eab3eb8c
Fix wrong result of bit op tree optimization #3509 (#3516)
* Tests: Add a test to reproduce #3509

* Tests: Compile without tautological-compare check because bit op tree optimization is disabled in the test.

* Internals: Dedup code. No functional change is intended.

* Fix #3509.

"2'b10 == (2'b11 & {1'b0, val[0]})"  and "2'b10 != (2'b11 & {1'b0, val[0]})" were
wrongly optimized to "!val[0]" and "val[0]" respectively.
Now properly optimize them to 1'b0 and 1'b1.

* Commentary

* Commentary: Update Changes
2022-07-24 19:54:37 +09:00
Wilson Snyder
5f3316d3dc * Fix empty string arguments to display (#3484). 2022-07-09 08:30:57 -04:00
Wilson Snyder
a4fddb3fbe Fix table misoptimizing away display (#3488). 2022-07-09 07:55:46 -04:00
Yutetsu TAKATSUKASA
9f37cef1bb
Fix #3470 of incorrect bit op tree optimization (#3476)
* Tests: Add a test to reproduce #3470

* Update LSB during return path of traversal. No functional change is intended.

* Introduce LeafInfo::m_msb

* Update LeafInfo::m_msb when visitin AstCCast

* Internals: Add comment, reorder. No functional change is intended.

* Delete explicit from copy constructor to fix build error.

* Update Changes

* Internals: Remove unused parameter. No functional change is intended.

* Tests: Add explanation to t_const_opt.
2022-07-06 08:33:37 +09:00
Wilson Snyder
f2fba51fe2 devel release 2022-06-19 15:13:29 -04:00
Wilson Snyder
7c79f0d431 Version bump (Changes update) 2022-06-19 15:10:23 -04:00
Wilson Snyder
1fa82ffb7b Commentary: Update ChangeLog 2022-06-19 15:00:10 -04:00
Wilson Snyder
e7dc2de14b Fix BLKANDNBLK on $readmem/$writemem (#3379). 2022-06-04 12:43:18 -04:00
Wilson Snyder
59dc2853e3 Support concat assignment to packed array (#3446). 2022-06-03 21:32:13 -04:00
Wilson Snyder
ada58465b2 Add -f<optimization> options to replace -O<letter> options (#3436). 2022-06-03 20:43:16 -04:00
Wilson Snyder
173f57c636 Changed --no-merge-const-pool to -fno-merge-const-pool (#3436). 2022-06-03 19:41:59 -04:00
Geza Lore
b51f887567
Perform VCD tracing in parallel when using --threads (#3449)
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.

This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.

This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
2022-05-29 19:08:39 +01:00
Geza Lore
0722f47539
Improve V3MergeCond by reordering statements (#3125)
V3MergeCond merges consecutive conditional `_ = cond ? _ : _` and
`if (cond) ...` statements. This patch adds an analysis and ordering
phase that moves statements with identical conditions closer to each
other, in order to enable more merging opportunities. This in turn
eliminates a lot of repeated conditionals which reduced dynamic branch
count and branch misprediction rate. Observed 6.5% improvement on
multi-threaded large designs, at the cost of less than 2% increase in
Verilation speed.
2022-05-27 16:57:51 +01:00
Wilson Snyder
530817191e Support non-ANSI interface port declarations (#3439). 2022-05-25 00:50:50 -04:00
Geza Lore
c7610ed044 Fix FST tracing thread in CMake build 2022-05-20 17:04:46 +01:00
Geza Lore
ffc1c51526 Commentary 2022-05-20 16:44:53 +01:00
Geza Lore
b130a8cfeb Add -DVM_TRACE_VCD in model builds with Make with --trace 2022-05-20 16:44:38 +01:00
Wilson Snyder
7f1a9239ab Commentary, fix typo (#3121) 2022-05-15 11:14:07 -04:00
Geza Lore
c8102c8ffe Fix typo 2022-05-15 16:01:35 +01:00
Wilson Snyder
5aa12e9b51 Add assert when VerilatedContext is mis-deleted (#3121). 2022-05-15 10:51:03 -04:00
Wilson Snyder
3c4131d45d Fix 'with' operator with type casting (#3387). 2022-05-15 09:53:48 -04:00
Wilson Snyder
c2328ef46a Spelling fixes. 2022-05-14 16:12:57 -04:00
Wilson Snyder
71dedccbbe Support compile time trace signal selection with tracing_on/off (#3323). 2022-05-12 22:28:08 -04:00
Wilson Snyder
3d762282b9 Fix hang with large case statement optimization (#3405). 2022-05-05 07:02:52 -04:00
Wilson Snyder
30783e6a79 devel release 2022-05-02 22:23:05 -04:00
Wilson Snyder
aa86c777f4 Version bump 2022-05-02 22:17:20 -04:00
Wilson Snyder
267315e7d4 Commentary: Update ChangeLog 2022-05-01 22:01:30 -04:00
Geza Lore
49c90ecbce Issue consistent INITIALDLY/COMBDLY/BLKSEQ warnings
Some cases of warnings about the use of blocking and non-blocking
assignments in combinational vs sequential processes were suppressed in
a way that is inconsistent with the *actual* current execution model of
Verilator. Turning these back on to, well, warn the user that these might
cause unexpected results. V5 will clean these up, but until then err on
the side of caution.

Fixes #864.
2022-04-29 17:05:44 +01:00
Geza Lore
9abab2c366 Add separate AstInitialStatic node for static initializers
Static variable initializers run before initial blocks, so use an
explicitly different procedure type for them. This also enables us to
now raise errors for assignments to const variables in initial blocks.
2022-04-23 15:12:49 +01:00
Geza Lore
a9cd2998e5 Don't mangle run-time library method names. 2022-04-23 14:47:16 +01:00
Geza Lore
0b74e9b354 Ensure topological ordering of module list.
At the end of V3Param, fix up the module list to be topologically
sorted. We need to do this at the end as a later instantiation of a
recursive module might instantiate an earlier specialization, which we
cannot know until we processed everything. The rest of the compiler
depends on the module list being topologically sorted.

Fixes #3393
2022-04-23 13:25:27 +01:00