All remaining use of conditional compilation in the tracing
implementation of the run-time library are replaced with the use of
VerilatedModel::traceConfig, and is now done at run-time.
Step towards a proper run-time library. Reduce the amount of ifdefs in
the implementation of offloaded tracing. There are still a very small
number of ifdefs left, which will need more careful changes in order to
keep user API compatibility.
VCD tracing is now parallelized using the same thread pool as the model.
We achieve this by breaking the top level trace functions into multiple
top level functions (as many as --threads), and after emitting the time
stamp to the VCD file on the main thread, we execute the tracing
functions in parallel on the same thread pool as the model (which we
pass to the trace file during registration), tracing into a secondary
per thread buffer. The main thread will then stitch (memcpy) the buffers
together into the output file.
This makes the `--trace-threads` option redundant with `--trace`, which
now only affects `--trace-fst`. FST tracing uses the previous offloading
scheme.
This obviously helps a lot in VCD tracing performance, and I have seen
better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on
OpenTitan 4T).
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.
This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.
Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
Reasons are:
- it's error prone to keep updating whennever m_suffixes is resized
- invalid pointer may be set when there is not signal to trace as in t_trace_dumporder_bad
* Add VL_ATTR_NO_SANITIZE_ALIGN macro to disable alignment check of ubsan
* Mark a function VL_ATTR_NO_SANITIZE_ALIGN because the function is intentionally using unaligned access for the sake of performance.
Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>
** Add simulation context (VerilatedContext) to allow multiple fully independent
models to be in the same process. Please see the updated examples.
** Add context->time() and context->timeInc() API calls, to set simulation time.
These now are recommended in place of the legacy sc_time_stamp().
* Fix memory leak of t_trace_cat and t_trace_cat_renew
* Fix memory leak of t_trace_c_api
* Fix memory leak in t_trace_public_func and t_trace_public_func_vlt
* Fix memory leaks in t_flat_build (and probably more).
* Use unique_ptr in testcases