Commit Graph

719 Commits

Author SHA1 Message Date
Geza Lore
47bce4157d
Introduce DFG based combinational logic optimizer (#3527)
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.

This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.

The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.

For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.

The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.

Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
2022-09-23 16:46:22 +01:00
Geza Lore
ddb678cc5b Merge branch 'master' into develop-v5 2022-09-22 17:33:36 +01:00
Geza Lore
63c694f65f Streamline dump control options
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
  special case of `--dumpi-<tag>` where tag can be a magic word, or a
  filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
  works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
  special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
2022-09-22 17:24:41 +01:00
Wilson Snyder
d162619bd3 Merge branch 'master' into develop-v5 2022-09-20 20:06:21 -04:00
Wilson Snyder
fc4ffd454e Rename --bin to --build-dep-bin. 2022-09-18 10:32:43 -04:00
Wilson Snyder
1234ee5fd2 Merge branch 'master' into develop-v5 2022-09-17 08:02:25 -04:00
Wilson Snyder
80b73859a2 Commentary: Some fixes from 'develop-v5' 2022-09-17 08:00:40 -04:00
Geza Lore
af305bf280 Merge branch 'master' into develop-v5 2022-09-16 16:24:36 +01:00
Wilson Snyder
ab6e1c2399 Commentary on --main 2022-09-15 20:26:08 -04:00
Kamil Rakoczy
da20da264b
Add --build-jobs, and rework arguments for -j (#3623) 2022-09-15 08:28:58 -04:00
Geza Lore
27031ed688 Merge branch 'master' into develop-v5 2022-09-15 10:28:35 +01:00
Wilson Snyder
75fd71d7e5 Add --main to generate main() C++ (previously was experimental only) (#3265). 2022-09-14 20:18:40 -04:00
Wilson Snyder
361cef4633 Fix pylint warnings. 2022-09-07 21:48:52 -04:00
Krzysztof Bieganski
39af5d020e
Timing support (#3363)
Adds timing support to Verilator. It makes it possible to use delays,
event controls within processes (not just at the start), wait
statements, and forks.

Building a design with those constructs requires a compiler that
supports C++20 coroutines (GCC 10, Clang 5).

The basic idea is to have processes and tasks with delays/event controls
implemented as C++20 coroutines. This allows us to suspend and resume
them at any time.

There are five main runtime classes responsible for managing suspended
coroutines:
* `VlCoroutineHandle`, a wrapper over C++20's `std::coroutine_handle`
  with move semantics and automatic cleanup.
* `VlDelayScheduler`, for coroutines suspended by delays. It resumes
  them at a proper simulation time.
* `VlTriggerScheduler`, for coroutines suspended by event controls. It
  resumes them if its corresponding trigger was set.
* `VlForkSync`, used for syncing `fork..join` and `fork..join_any`
  blocks.
* `VlCoroutine`, the return type of all verilated coroutines. It allows
  for suspending a stack of coroutines (normally, C++ coroutines are
  stackless).

There is a new visitor in `V3Timing.cpp` which:
  * scales delays according to the timescale,
  * simplifies intra-assignment timing controls and net delays into
    regular timing controls and assignments,
  * simplifies wait statements into loops with event controls,
  * marks processes and tasks with timing controls in them as
    suspendable,
  * creates delay, trigger scheduler, and fork sync variables,
  * transforms timing controls and fork joins into C++ awaits

There are new functions in `V3SchedTiming.cpp` (used by `V3Sched.cpp`)
that integrate static scheduling with timing. This involves providing
external domains for variables, so that the necessary combinational
logic gets triggered after coroutine resumption, as well as statements
that need to be injected into the design eval function to perform this
resumption at the correct time.

There is also a function that transforms forked processes into separate
functions.

See the comments in `verilated_timing.h`, `verilated_timing.cpp`,
`V3Timing.cpp`, and `V3SchedTiming.cpp`, as well as the internals
documentation for more details.

Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
2022-08-22 13:26:32 +01:00
Wilson Snyder
b25b798dbe Merge branch 'master' into develop-v5 2022-07-04 13:20:03 -04:00
Wilson Snyder
fa99cbbc73 Commentary: Fix mis-sorted option names. No functional change. 2022-06-21 19:28:26 -04:00
Wilson Snyder
0f324c8309 Merge branch 'master' into develop-v5 2022-06-04 11:59:49 -04:00
Wilson Snyder
67f7432dd7 Commentary (#3436). 2022-06-04 08:37:42 -04:00
Wilson Snyder
ada58465b2 Add -f<optimization> options to replace -O<letter> options (#3436). 2022-06-03 20:43:16 -04:00
Wilson Snyder
173f57c636 Changed --no-merge-const-pool to -fno-merge-const-pool (#3436). 2022-06-03 19:41:59 -04:00
Geza Lore
b51f887567
Perform VCD tracing in parallel when using --threads (#3449)
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.

This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.

This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
2022-05-29 19:08:39 +01:00
Geza Lore
599d23697d
IEEE compliant scheduler (#3384)
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.

With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).

Details of the new scheduling model and algorithm are provided in
docs/internals.rst.

Implements #3278
2022-05-15 16:03:32 +01:00
Wilson Snyder
e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Geza Lore
b1b5b5dfe2 Improve run-time profiling
The --prof-threads option has been split into two independent options:
1. --prof-exec, for collecting verilator_gantt and other execution
related profiling data, and
2. --prof-pgo, for collecting data needed for PGO

The implementation of execution profiling is extricated from
VlThreadPool and is now a separate class VlExecutionProfiler. This means
--prof-exec can now be used for single-threaded models (though it does
not measure a lot of things just yet). For consistency VerilatedProfiler
is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are
in verilated_profiler.{h/cpp}, but can be used completely independently.

Also re-worked the execution profile format so it now only emits events
without holding onto any temporaries. This is in preparation for some
future optimizations that would be hindered by the introduction of function
locals via AstText.

Also removed the Barrier event. Clearing the profile buffers is not
notably more expensive as the profiling records are trivially
destructible.
2022-03-27 15:57:30 +02:00
Wilson Snyder
4eaa6fdd06 Internals: Use python pass appropriately. No functional change intended. 2022-03-26 15:57:52 -04:00
Wilson Snyder
d3b63b2653 Fix error if file not found 2022-02-09 21:56:22 -05:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Wilson Snyder
899de9a282 Add --lib-create, similar to --protect-lib but without protections (#3200). 2021-11-14 09:39:31 -05:00
Wilson Snyder
2560fc867f verilator_gantt: Fix reading broken /cpu/procinfo reports 2021-10-02 11:10:43 -04:00
Wilson Snyder
9029da5ab8 Add profile-guided optmization of mtasks (#3150). 2021-09-26 22:51:11 -04:00
Wilson Snyder
741bb5328e verilator_gantt: Fix argument report omitting last digits 2021-09-24 21:11:15 -04:00
Wilson Snyder
8ab51dbf22 Verilator_gantt: remove ASCII graphics 2021-09-24 08:48:20 -04:00
Wilson Snyder
fd4595d6b4 verilator_gantt: Add eval count to report 2021-09-24 08:48:20 -04:00
github action
204804ae52 Apply 'make format' 2021-09-24 03:00:42 +00:00
Wilson Snyder
c2819923c5 Verilator_gantt now shows the predicted mtask times, eval times, and additional statistics. 2021-09-23 22:59:36 -04:00
Wilson Snyder
97d8d32049 Commentary 2021-09-17 18:52:12 -04:00
Wilson Snyder
f1b8b1d99b Format: perltidy spacing cleanup. No functional change. 2021-09-08 18:45:25 -04:00
Wilson Snyder
4b274a8d4d Convert verilator_gantt to python 2021-09-08 08:16:31 -04:00
Wilson Snyder
c678e7ec3e Format: perltidy spacing cleanup. No functional change. 2021-09-07 23:50:28 -04:00
Wilson Snyder
d09b6a7d2c Include processor information in verilator_gantt data file. 2021-09-05 11:56:28 -04:00
Wilson Snyder
35eac0c457 Commentary: Fix profcfunc naming. 2021-09-05 11:33:20 -04:00
Wilson Snyder
cac0b0f316 Tests: Fix t_gantt_io 2021-09-04 16:51:55 -04:00
Wilson Snyder
1cb8091125 verilator_profcfunc: Also allow eval_step. 2021-09-04 12:28:26 -04:00
Wilson Snyder
18c543f2f2 Convert verilator_difftree to Python 2021-09-04 09:31:22 -04:00
Wilson Snyder
1ee46ac5e1 Convert verilator_profcfunc to python. 2021-09-04 08:21:09 -04:00
Wilson Snyder
56dc66d842 Fix verilator_profcfunc profile accounting (#3115). 2021-09-03 19:59:10 -04:00
Geza Lore
cdeb6e792f Add --instr-count-dpi option, change default to 200
This replaces the former static AstNode::INSTR_COUNT_DPI, and makes it
user adjustable to fit the design.

Fixes #3068.
2021-07-25 16:40:12 +01:00
Wilson Snyder
36599133bf Add --prof-c to pass profiling to compiler (#3059). 2021-07-07 19:12:52 -04:00
Geza Lore
ec1c112791
Remove deprecated --inhibit-sim (#3035) 2021-06-21 12:38:42 -04:00
Geza Lore
9eafca5e28
Remove deprecated --no-relative-cfuncs (#3024) 2021-06-16 23:17:43 -04:00
Wilson Snyder
31121141db Convert pipe filter example to python 2021-06-13 12:03:53 -04:00
Geza Lore
c207e98306
Implement a distinct constant pool (#3013)
What previously used to be per module static constants created in
V3Table and V3Prelim are now merged globally within the whole model and
emitted as part of a separate constant pool. Members of the constant
pool are global variables which are declared lazily when used (similar to
loose methods).
2021-06-13 15:05:55 +01:00
Geza Lore
5555f20bd2 Improve ccache-report 2021-06-09 19:14:11 +01:00
github action
c238d3da76 Apply clang-format 2021-06-06 23:57:41 +00:00
Geza Lore
0edf1f0c94
Add ccache-report target to standard Makefile (#3011)
Using the standard model Makefile, when in addition to an explicit
target, the target 'ccache-report' is also given, a summary of ccache
hits/misses during this invocation of 'make' will be prited at the end
of the build.
2021-06-07 00:56:30 +01:00
Wilson Snyder
1e89392e76 Add --expand-limit argument (#3005). 2021-06-06 10:27:01 -04:00
Geza Lore
38cab569ed
Add --reloop-limit argument (#2960)
Add --reloop-limit argument
2021-05-15 18:04:40 +01:00
Wilson Snyder
cf7f49e139 Docs: Fix cross-references 2021-04-13 09:25:11 -04:00
Wilson Snyder
adce7ecf4b Documentation has been rewritten into a book format. 2021-04-11 18:55:06 -04:00
Udi Finkelstein
23c243bb82
Add support for null ports (#2875) 2021-04-09 10:39:46 -04:00
Wilson Snyder
efc116323b Commentary 2021-04-06 18:07:28 -04:00
Àlex Torregrosa
a29ac44af9
Add FST SystemC tracing (#2806) 2021-04-06 16:18:58 -04:00
Wilson Snyder
5658d7238d Commentary 2021-04-03 13:11:26 -04:00
Udi Finkelstein
0ea5af40c5
Add PINNOTFOUND warning in place of "Pin not found" error (#2868) 2021-04-01 18:17:42 -04:00
Wilson Snyder
c62546c761 Add --coverage-max-width (#2853). 2021-03-29 18:54:51 -04:00
Wilson Snyder
02d82978e5 Avoid quoting + to appease cygwin bash/gcc --gdbbt (#2856) 2021-03-26 22:13:55 -04:00
Wilson Snyder
96f9f8558b Mark --no-relative-cfuncs as scheduled for deprecation. 2021-03-17 18:59:45 -04:00
Wilson Snyder
5da9368032 Update contributors. 2021-03-17 18:39:32 -04:00
Wilson Snyder
dfab80fab1 Fix false TIMESCALEMOD on generate-ignored instances (#2838). 2021-03-16 22:52:29 -04:00
Wilson Snyder
12eb4e85ac Changed TIMESCALEMOD from error into a warning. (#2838) 2021-03-16 21:58:15 -04:00
Wilson Snyder
daf5174134 Commentary 2021-03-14 21:26:56 -04:00
Wilson Snyder
8350c381c2 Add EOFNEWLINE warning when missing a newline at EOF. 2021-03-14 21:23:48 -04:00
Wilson Snyder
5022e81af7 Commentary 2021-03-12 14:14:21 -05:00
Wilson Snyder
c99f01b7fe Converted Asciidoc documentation into reStructuredText (RST) format. 2021-03-12 13:52:47 -05:00
Wilson Snyder
2cde7b0128 Commentary 2021-03-12 08:50:24 -05:00
Wilson Snyder
2cad22a22a
Add simulation context (VerilatedContext) (#2660). (#2813)
**   Add simulation context (VerilatedContext) to allow multiple fully independent
      models to be in the same process.  Please see the updated examples.
**   Add context->time() and context->timeInc() API calls, to set simulation time.
      These now are recommended in place of the legacy sc_time_stamp().
2021-03-07 11:01:54 -05:00
Wilson Snyder
caa9c99837 Commentary 2021-03-07 08:28:13 -05:00
Wilson Snyder
7cc7afca04 Commentary 2021-03-02 17:39:16 -05:00
Wilson Snyder
fec5e69ec5 --inhibit-sim is planned for deprecation, file a bug if this is still being used. 2021-02-28 09:26:06 -05:00
Wilson Snyder
e44563fddc Tests: Use vl_time_stamp64 where reasonable 2021-02-16 20:14:30 -05:00
Yutetsu TAKATSUKASA
f2a7f30b09
Commentary: In docs remove +I (#2789) 2021-02-15 08:40:49 -05:00
Morten Borup Petersen
843ae2955e
Commentary on incorrectly specified debug level (#2777)
As seen at https://github.com/verilator/verilator/blob/master/src/V3Options.cpp#L1202
setting --debug enables a debug level of 3.
2021-02-03 14:38:34 -05:00
Wilson Snyder
3c79e00d24 Commentary (#2764) 2021-01-24 20:24:23 -05:00
Julien Margetts
a11700271f
Add LATCH and NOLATCH warnings (#1609) (#2740). 2021-01-05 14:26:01 -05:00
Wilson Snyder
bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder
b04a1caeac In warnings, rename cells to instances to match IEEE 2020-12-12 22:43:55 -05:00
Krzysztof Bieganski
5e7b0d526d
Support 'randc' as alias to 'rand' (#2680)
* Add alias 'randc' to 'rand'

* Make the 'RANDC' warning; add tests
2020-12-09 19:17:30 -05:00
Wilson Snyder
3c680ddf22 Tests: Test some warnings without tests. 2020-12-07 20:30:16 -05:00
Wilson Snyder
47eeef485d Report UNUSED on parameters, localparam and genvars (#2627). 2020-12-07 19:49:50 -05:00
Wilson Snyder
74ef35d3b3 Support $cast and new CASTCONST warning. 2020-12-05 22:58:36 -05:00
Wilson Snyder
8582aed66a Add --top option as alias of --top-module. 2020-12-05 16:58:17 -05:00
Wilson Snyder
665e8edaff Support $monitor and $strobe. 2020-11-29 11:31:38 -05:00
Wilson Snyder
cf2810db8b Change -sv option to select 1800-2017 instead of 1800-2005. 2020-11-27 21:49:47 -05:00
Wilson Snyder
103ba1fb6d Commentary 2020-11-20 21:48:11 -05:00
Wilson Snyder
b684995292 Support $random and $urandom seeds. 2020-11-19 21:32:33 -05:00
Wilson Snyder
c64cc989f0 verilator_gantt: Show CPU physical info. 2020-11-14 12:22:15 -05:00
Wilson Snyder
5d3482734a Fix precision in verilator_gantt. 2020-11-13 18:51:45 -05:00
Wilson Snyder
87b2adb6a1 With --debug, turn off address space layout randomization. 2020-11-12 18:49:49 -05:00
Wilson Snyder
6dfce882a1 Support $exit as alias of $finish 2020-11-10 22:49:48 -05:00
Wilson Snyder
f4ed367850 Internals: Remove verilator_difftree old cleanups. 2020-10-15 21:36:35 -04:00