__gcov_flush was a private function and was removed from later GCC
versions (at least from 11.2.0, possibly earlier). Replace with the
documented public __gcov_dump.
These have been 'deprecated' for 2 years and are otherwise unused except
for using a temporary placeholder value, which I have inlined with the
default value.
Also remove the now VL_TIME_STR_CONVERT utility function (and
corresponding unit tests), which have no references in any project on
GitHub.
All remaining use of conditional compilation in the tracing
implementation of the run-time library are replaced with the use of
VerilatedModel::traceConfig, and is now done at run-time.
Always fail if adding a model to a trace file that has already executed
a dump. We used to do this before as well, though in a less robust way.
We will be relying on this property more in the future, so improve the
check.
Always build the FST libray with -DFST_WRITER_PARALLEL, iff VL_THREADED.
This supports run-time enablement of the FST writer thread, and has no
measurable performance impact on single threaded tracing but simplifies
the library build.
Note: the actual choice of using the fst writer thread is still compile
time, but can now be made run-time easily.
Step towards a proper run-time library. Reduce the amount of ifdefs in
the implementation of offloaded tracing. There are still a very small
number of ifdefs left, which will need more careful changes in order to
keep user API compatibility.
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.
This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.
This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
This change is not a functional one; it is only meant to appease the
compiler with respect to warnings such as GCC's `-Wtype-limits`.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
This is a major re-design of the way code is scheduled in Verilator,
with the goal of properly supporting the Active and NBA regions of the
SystemVerilog scheduling model, as defined in IEEE 1800-2017 chapter 4.
With this change, all internally generated clocks should simulate
correctly, and there should be no more need for the `clock_enable` and
`clocker` attributes for correctness in the absence of Verilator
generated library models (`--lib-create`).
Details of the new scheduling model and algorithm are provided in
docs/internals.rst.
Implements #3278