Other sub-classes of AstNodeCCall do not need the self pointer. Moving
it into the specific sub-class that needs it clarifies V3Descope and
Emit. No functional change intended.
Internal AstNodeModule headers (.h) and implementation (.cpp) files are
now emitted separately in V3EmitC::emitcHeaders() and
V3EmitC::emitcImp() respectively. No functional change intended
A separate V3VariableOrder pass is now used to order module variables
before Emit. All variables are now ordered together, without
consideration for whether they are ports, signals form the design, or
additional internal variables added by Verilator (which used to be
ordered and emitted as separate groups in Emit). For single threaded
models, this is performance neutral. For multi-threaded models, the
MTask affinity based sorting was slightly modified, so variables with no
MTask affinity are emitted last, otherwise the MTask affinity sets are
sorted using the TSP sorter as before, but again, ports, signals, and
internal variables are not differentiated. This yields a 2%+ speedup for
the multithreaded model on OpenTitan.
This patch makes OpenTitan verilation with --debug-check 22% faster, and
the same with --debug --no-dump-tree 91% faster. Functionality is the
same (including when VL_LEAK_CHECKS is defined), except V3Broken can now
always find duplicate references via child/next pointers if the target
node is not `maybePointedTo()` (previously this only happened when
compiled with VL_LEAK_CHECKS). The main change relates to storing the
v3Broken traversal state in the AstNode by stealing a byte from what
used to be unused flags. We retain an unordered_set only for marking
pointers as valid to be referenced via a non-child/non-next member
pointer.
Do not sort and hoist function local variables to the top of the
function definition. The stack layout of automatic variables is not
defined by C so the compilers can lay these out optimally. Simplifies
internals for follow on work. Effect on model performance is neutral to
very slight improvement, so we do not seem to be loosing anything.
V3Broken now checks that AstVar nodes referenced in an AstCFunc are
either external, or appear in the tree before the reference, and are in
scope.
Fix V3Begin to move lifted AstVars to beginning of FTask, rather than
end, which trips the above check.
debug() declared by VL_DEGUB_FUNC used to cache the result of the debug
level lookup (which depends on options) in a static. This meant that if
the debug() function was called before option parsing, the default debug
level of 0 would be used for the rest of the program, even if a --debug
option was given. Fixed by not caching the debug level until after
option parsing is complete.
The bool AstNode flags fit in a padding gap after m_type. This reduces
memory consumption by about 2% on OpenTitan. Initializing the unused
bits in the flags then avoids a read-modify-write in the constructor
(replacing it with a store constant). Overall verilation speed is about
1% faster.
The -G option now correctly parses simple integer literals as signed
numbers, which is in line with the standard and is significant when
overriding parameters without a type specifier.
Fixes#3060
Remove magic code fragments form EmitCTrace, so Emit need not be aware
that a function is tracing related or not (apart from the purpose of
file name generation). All necessary code is now generated via text
nodes in V3TraceDecl and V3Trace. No functional change intended.
For consistency with the rest of the generated code, generated methods
related to tracing now use snake_case instead of camelCase. No
functional change intended.
* Move MTaskState to ThreadSchedule
MTaskState does not concern itself with sandbagging, and thus solely contains information related to the finalized schedule, i.e., completion time, thread ID and next MTask on thread.
* Add .dot graph visualization of ThreadSchedule
Follow-up to #2779.
This commit adds the creation of .dot files - used by GraphViz - to visualize how mtasks are statically scheduled across the set of specified threads.
We visualize each thread as a row, with nodes of a row being the mtasks scheduled for the given thread. The width of the mtask nodes are proportional to their cost. MTask dependencies are shown using an edge between the source and sink mtasks.
V3Simulate reuses allocated AstConst nodes for efficiency, however this
used to be implemented in a way that required a deep copy of a
std::unorderd_map<_, std::deque<_>>, which was quite inefficient when it
grew large. The free list is now managed without any copying. This takes
the V3Table pass from taking 12s to 0.2s on SweRV EH1.
We incorrectly treated two different struct types the same when passed
as an actual parameter to a `parameter type` parameter in an instance,
if the actual parameter expression both hash to the same value and the
structs have the same struct name. This is now corrected.
Fixes#3055.
This patch implements #3032. Verilator creates a module representing the
SystemVerilog $root scope (V3LinkLevel::wrapTop). Until now, this was
called the "TOP" module, which also acted as the user instantiated model
class. Syms used to hold a pointer to this root module, but hold
instances of any submodule. This patch renames this root scope module
from "TOP" to "$root", and introduces a separate model class which is
now an interface class. As the root module is no longer the user
interface class, it can now be made an instance of Syms, just like any
other submodule. This allows absolute references into the root module to
avoid an additional pointer indirection resulting in a potential speedup
(about 1.5% on OpenTitan). The model class now also contains all non
design specific generated code (e.g.: eval loops, trace config, etc),
which additionally simplifies Verilator internals.
Please see the updated documentation for the model interface changes.
This commit removes shadowing of the vlSymsp member of the emitted
modules, allowing models to compile when -Werror=shadow is set. This may
be useful when i.e., an external project which defines its own error
flags depends on the verilated model.
Factored out bits from V3EmitC.cpp that is required to emit a whole
(non-trace) AstCFunc. This is mostly what used to be the EmitCStmts
class plus relevant bits from EmitCImp. These now live in EmitCFunc,
which is reusable by anything that needs to emit a regular AstCFunc
(differences in tracing to be addressed later). EmitCImp now extends
EmitCFunc instead of EmitCStmts. No functional change intended.
Moved these 2 function into V3EmitCBase so we can reuse them later.
emitVarDecl required minor alteration to move building of m_ctorVarsVec
back into V3EmitC (which is now done in V3EmitC::emitSortedVarList).
No functional change intended.
* Add a test to reproduce #3023. Also applied verilog-mode formatting.
* use unique_ptr. No functional change is intended.
* Introduce restorer that reverts changes during iterate() if failed.
This used to be done in the constructor of the top module, but there is
no reason to do it there. Internals are cleaner with this in the Sym
constructor. No functional change intended.
AND(CONST,SHIFTR(_,C)) appears often after V3Expand, with C a large
enough dense mask (i.e.: of the form (1 << n) - 1) to make the masking
redundant. E.g.: 0xff & ((uint32_t)a >> 24). V3Const now replaces these
ANDs with the SHIFTR node.
Similarly, we also simplify the same with SHIFTL,
e.g.: 0xff000000 & ((uint32_t)a << 24)
V3Expand generates a lot of OR nodes that are under a clearing mask, and
have redundant terms, e.g.: 0xff & (a << 8 | b >> 24). The 'a << 8' term
in there is redundant as it's bottom bits are all zero where the mask is
non-zero. V3Const now removes these redundant terms.
Teach V3Localize how to localize variables that are used in multiple
functions, if in all functions where they are used, they are always
written in whole before being consumed. This allows a lot more variables
to be localized (+20k variables on OpenTitan - when building without
--trace), and can cause significant performance improvement (OpenTitan
simulates 8.5% - build single threaded and withuot --trace).
These utility classes can be used to hang advanced data structures off
AstNode user*u() pointers, and they take care of memory management for
the client. Use via the call operator().
V3Localize can now localize variable references that reference variables
located in scopes different from the referencing function. This also
means V3Descope has now moved after V3Localize.
When part selecting bits via an AstSel in a VlWide, V3Expand used to do
something akin to:
word_index = lsb / 32;
bit_index = lsb % 32;
result =
wide[word_index + 1] << (32 - bit_index) | wide[word_index] >> bit_index;
The unconditional "+ 1" can cause an out of bounds access into the
VlWide, when the whole of the select is into the most significant word
(i.e.: when word_index is already the most significant word). We now
emit roughly this instead:
lo_word_index = lsb / 32;
bit_index = lsb % 32;
hi_word_index = (lsb + width - 1) / 32;
result =
wide[hi_word_index] << (32 - bit_index) | wide[lo_word_index] >> bit_index;
i.e.: we explicitly calculate which word the MSB of the select falls
into, and address that word, rather than the unconditional + 1. The
shifts ensure we still yield the right result, even if lo_word_index and
hi_word_index are the same.
Note: The actual expression created by V3Expand can be a bit more
complicated as we might need to access 3 words when the result is a
QData, all 3 word indices are calculated explicitly.
emitVarCmtChg used to emit MTask affinity of variables in comments in
the generated header. This causes unnecessary changes in the output when
scheduling changes slightly between compilation, hindering ccache reuse.
If needing this info for debugging Verilator, add a separate dump file
instead of emitting it in the generated code.
The goal of this patch is to move functionality related to constructing
the thread entry points and then invoking them out of V3EmitC (and into
V3Partition). The long term goal being enabling V3EmitC to emit
functions partitioned based on header dependencies. V3EmitC having to
deal with only AstCFunc instances and no other magic will facilitate
this.
In this patch:
- We construct AstCFuncs for each thread entry point in
V3Partition::finalize and move AstMTaskBody nodes under these functions.
- Add the invocation of the threads as text statements within the
AstExecGraph, so they are still invoked where the exec graph is located.
(the entry point functions are still referenced via AstCCall or
AstAddOrCFunc, so lazy declarations of referenced functions are created
automatically).
- Explicitly handle MTask state variables (VlMTaskVertex in
verilated_threads.h) within Verilator, so no need to text bash a lot of
these any more (some text refs still remain but they are all created
next to each other within V3Partition.cpp).
The effect of all this on the emitted code should be nothing but some
identifier/ordering changes. No functional change intended.
What previously used to be per module static constants created in
V3Table and V3Prelim are now merged globally within the whole model and
emitted as part of a separate constant pool. Members of the constant
pool are global variables which are declared lazily when used (similar to
loose methods).
This patch introduces the concept of 'loose' methods, which semantically
are methods, but are declared as global functions, and are passed an
explicit 'self' pointer. This enables these methods to be declared
outside the class, only when they are needed, therefore removing the
header dependency. The bulk of the emitted model implementation now uses
loose methods.
Check the C++ compiler for -Og via configure and use it if available.
Per the GCC manual:
-Og should be the optimization level of choice for the standard
edit-compile-debug cycle, offering a reasonable level of optimization
while maintaining fast compilation and a good debugging experience. It
is a better choice than -O0 for producing debuggable code because some
compiler passes that collect debug information are disabled at -O0.
The debug exe is painfully slow on large designs, hopefully this is an
improvement.
Similarly, check for and use -gz to compress the debug info as it is
huge otherwise. This should help with distribution and caching on CI.
Also checks for -ggdb via configure for compatibility.
- Initialize variable to avoid 'may be uninitialized' warning
- More reliable segfault (the previous version was compiled into an
undefined instruction by clang sometimes, thew new one is always a store
to zero).
- Better combining, without multiplication, which means a 0 hash value
is now allowed.
- Do not OR in bottom bits (this was used to avoid a 0 hash but had the
side effect of hashing 0 and 1 to the same value, which are actually
common inputs.
- Hash whole content of V3Number. This does not seem to be noticeable in
runtime, but quite often the bottom word can be a special value like
zero while the rest of the content varies.
A few names were incorrectly mangled, which made --protect-ids produce
invalid output when certain SV class constructs were uses. Now fixed and
added a few extra tests to catch this.
// was so far unconditionally treated as comment.
c443e229ee introduced a string literal in
the output that contained a http:// url, which broke the indent
tracking. No functional change intended
astgen now generates ASTGEN_SUPER_* macros, instead of expanding the
ASTGEN_SUPER itself. This means that V3AstNodes.h itself can be included
in V3Ast.h, which helps IDEs (namely CLion) find definitions in
V3AstNodes.h a lot better. Albeit this is a little bit more boilerplate,
writing constructors of Ast nodes should be a lot rarer than trying to
find their definitions, so hopefully this is an improvement overall.
astgen will error if the developer calls the wrong superclass
constructor.
Rework Ast hashing to be stable
Eliminated reliance on pointer values in AstNode hashes in order to make
them stable. This requires moving the sameHash functions into a visitor,
as nodes pointed to via members (and not child nodes) need to be hashed
themselves.
The hashes are now stable (as far as I could test them), and the impact
on verilation time is small enough not to be reliably measurable.
V3Hasher is responsible for computing AstNode hashes, while V3DupFinder
can be used to find duplicate trees based on hashes. Interface of
V3DupFinder simplified somewhat. No functional change intended at this
point, but hash computation might differ in minor details, this however
should have no perceivable effect on output/runtime.
Implements (#2964)
- Remove dead code
- Simplify what is left
- Minor improvements using C++11
- Remove special case AstNode constructors used only in one place in
V3Combine (inlined instead).
No functional changes intended. Boring patch in the hope of being
helpful to new contributors. Although AstNode* base classes needing to
be abstract is described in the internals manual, this might provide a
bit of extra safety.
This seems to belong there, eliminates some code duplication in V3Clock,
and also enables splitting AstAlwaysPost statements into different
functions in V3Order, but should otherwise have little effect.
CFuncs only used to be split at procedure (always/initial/final block),
which on occasion can still yield huge output files if they have large
procedures. This patch make CFuncs split at statement boundaries within
procedures. This has the potential to help a lot, but still does not
help if there are huge statements within procedures.
V3Reloop now can roll up indexed assignments between arrays if there is a
constant offset between indices on the left and right hand sides, e.g.:
a[0] = b[2];
a[1] = b[3];
...
a[x] = b[x + 2];