This saves about 5% memory. V3AstUserAllocator is appropriate for most use
cases, performance is marginally up as we are mostly D-cache bound on
large designs.
The typical find/if-not-exists-insert pattern can be achieved with 1
lookup instead of 2 using emplace with a sentinel value. Also maps value
initialize their values when inserted with the [] operator, this is
defined and so there is no need to explicitly insert zeroes for integer
values.
This patch adds some abstract enums to pass to the trace decl* APIs, so
the VCD/FST specific code can be kept in verilated_{vcd,fst}_*.cc, and
removed from V3Emit*. It also reworks the generation of the trace init
functions (those that call 'decl*' for the signals) such that the scope
hierarchy is traversed precisely once during initialization, which
simplifies the FST writer. This later change also has the side effect of
fixing tracing of nested interfaces when traced via an interface
reference - see the change in the expected t_interface_ref_trace - which
previously were missed.
Some values emitted to the trace files are constant (e.g.: actual
parameter values), and never change. Previously we used to trace these
in the 'full' dumps, which also included all other truly variable
signals. This patch introduces a new generated trace function 'const',
to complement the 'full' and 'chg' flavour, and 'const' now only
contains the constant signals, while 'full' and 'chg' contain only the
truly variable signals. The generate 'full' and 'chg' trace functions
now have exactly the same shape. Note that 'const' signals are still
traced using the 'full*' dump methods of the trace buffers, so there is
no need for a third flavour of those.
V3Stats for "fast" code have bit-rotted a little and is causing some
problems with tests that rely on stats outputs. The problem is that not
all code is necessarily reachable from eval() any more (due to the
complexity of some the features added over the past few years), so it
might miss some things, as for measuring the "fast" code, it is trying
to trace the execution paths via calls, starting from eval(). It also
appears the fast code can also contain calls to slow code in some
circumstances.
To avoid all that, removed trying to trace dynamic execution, and simply
report the static node counts, which is enough for testing.
Similarly, the variable counts are somewhat dubious, as they don't
include all data types, or all instances of a module in some stages.
Removing these as they are not widely used nor dependable. More specific
stats can be added if required and can be well defined.
Prior to this we failed to create implicit nets for inputs of gate
primitives, which is required by the standard (IEEE 1800-2017 6.10).
Note: outputs were covered due to being modeled as the LHS of
assignments, which do create implicit nets.
Again --prof-exec have bit-rotted a little with all the recent changes
to the structure of the generated code. This patch contains a few
improvements:
- Repalce the eval/evl_loop begin/end events with generic
section_push/section_pop events, that can be arbitrarily sprinkled
into the generate code (so long as they are matched correctly) to
measure various sections. The report then contains a nested profile
of the sections, and the VCD trace shows the section names.
- Better handling of exec graphs
- Clearer overall statistics
Still some remains of the --threads 0 mode. Remove unnecessary complexity
from V3EmitCModel. (Also don't pretend there is an MTask in single
threaded mode, when there really isn't.)
It's unlikely one value fits all use case, so making VL_LOCK_SPINS
configurable at model build time.
For testing, we reduce the value as we expect high contention.
Previously V3InstrCount used to completely ignore an AstConcat,
including its children (see the rational in the comment). The problem is
the operands can be huge and expensive compound expressions (especially
since DFG), and not just a simple variable reference. This fix gains
some MT speed improvement.
Since we removed --threads 0 support, the 'threads()' option always
returns a value >= 1. Remove corresponding dead code.
Some of the coverage counters appear to use atomics even if the model is
single threaded. I'm under the impression this was a bug originally so
those ones I changed to use threads() > 1 instead.
According to 1800-2017 36.3, 1800-2017 A.9.3, 1364-2005 20.2 and 1364-2005 A.9.3, user defined system task and function identifiers can use the same character set for the second character as all the following characters.