VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.
This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.
This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
V3MergeCond merges consecutive conditional `_ = cond ? _ : _` and
`if (cond) ...` statements. This patch adds an analysis and ordering
phase that moves statements with identical conditions closer to each
other, in order to enable more merging opportunities. This in turn
eliminates a lot of repeated conditionals which reduced dynamic branch
count and branch misprediction rate. Observed 6.5% improvement on
multi-threaded large designs, at the cost of less than 2% increase in
Verilation speed.
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278
Some cases of warnings about the use of blocking and non-blocking
assignments in combinational vs sequential processes were suppressed in
a way that is inconsistent with the *actual* current execution model of
Verilator. Turning these back on to, well, warn the user that these might
cause unexpected results. V5 will clean these up, but until then err on
the side of caution.
Fixes#864.
Static variable initializers run before initial blocks, so use an
explicitly different procedure type for them. This also enables us to
now raise errors for assignments to const variables in initial blocks.
At the end of V3Param, fix up the module list to be topologically
sorted. We need to do this at the end as a later instantiation of a
recursive module might instantiate an earlier specialization, which we
cannot know until we processed everything. The rest of the compiler
depends on the module list being topologically sorted.
Fixes#3393
The --prof-threads option has been split into two independent options:
1. --prof-exec, for collecting verilator_gantt and other execution
related profiling data, and
2. --prof-pgo, for collecting data needed for PGO
The implementation of execution profiling is extricated from
VlThreadPool and is now a separate class VlExecutionProfiler. This means
--prof-exec can now be used for single-threaded models (though it does
not measure a lot of things just yet). For consistency VerilatedProfiler
is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are
in verilated_profiler.{h/cpp}, but can be used completely independently.
Also re-worked the execution profile format so it now only emits events
without holding onto any temporaries. This is in preparation for some
future optimizations that would be hindered by the introduction of function
locals via AstText.
Also removed the Barrier event. Clearing the profile buffers is not
notably more expensive as the profiling records are trivially
destructible.
While GNU 'ar' supports '@' to specify a file, BSD 'ar' does not.
The max line length can be handled by 'xargs' instead, which will know
to break up the command. In case there are multiple calls, only build
the index (specified with '-s') once in a later call.
Using the 'forceable' directive in a configuration file, or the /*
verilator forceable */ metacomment on a variable declaration will
generate additional public signals that allow the specified signals to
be forced/released from the C++ code.
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.
This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.
Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
Verilator should now correctly re-evaluate any logic that depends on
state set in a DPI exported function, including if the DPI export is
called outside eval, or if the DPI export is called from a DPI import.
Whenever the design contains a DPI exported function that sets a
non-local variable, we create a global __Vdpi_export_trigger flag, that
is set in the body of the DPI export, and make all variables set in any
DPI exported functions dependent on this flag (this ensures correct
ordering and change detection on state set in DPI exports when needed).
The DPI export trigger flag is cleared at the end of eval, which ensured
calls to DPI exports outside of eval are detected. Additionally the
ordering is modifies to assume that any call to a 'context' DPI import
might call DPI exports by adding an edge to the ordering graph from the
logic vertex containing the call to the DPI import to the DPI export
trigger variable vertex (note the standard does not allow calls to DPI
exports from DPI imports that were not imported with 'context', so we
do not enforce ordering on those).
All parameters that are required in the output are now emitted as
'static constexpr, except for string or array of strings parameters,
which are still emitted as 'static const' (required as std::string is
not a literal type, so cannot be constexpr). This simplifies handling
of parameters and supports 'real' parameters.
This patch partitions AstCFuncs under an AstNodeModule based on which
header files they require for their implementation, and emits them
into separate files based on the distinct dependency sets. This helps
with incremental recompilation of the output C++.
The -G option now correctly parses simple integer literals as signed
numbers, which is in line with the standard and is significant when
overriding parameters without a type specifier.
Fixes#3060