This saves about 5% memory. V3AstUserAllocator is appropriate for most use
cases, performance is marginally up as we are mostly D-cache bound on
large designs.
Prior to this we failed to create implicit nets for inputs of gate
primitives, which is required by the standard (IEEE 1800-2017 6.10).
Note: outputs were covered due to being modeled as the LHS of
assignments, which do create implicit nets.
Again --prof-exec have bit-rotted a little with all the recent changes
to the structure of the generated code. This patch contains a few
improvements:
- Repalce the eval/evl_loop begin/end events with generic
section_push/section_pop events, that can be arbitrarily sprinkled
into the generate code (so long as they are matched correctly) to
measure various sections. The report then contains a nested profile
of the sections, and the VCD trace shows the section names.
- Better handling of exec graphs
- Clearer overall statistics
According to 1800-2017 36.3, 1800-2017 A.9.3, 1364-2005 20.2 and 1364-2005 A.9.3, user defined system task and function identifiers can use the same character set for the second character as all the following characters.
This API is used if the user copies the process using `fork`
and similar OS-level mechanisms. The `at_clone` member function
ensures that all model-allocated resources are re-allocated, such
that the copied child process/model can simulate correctly.
A typical allocated resource is the thread pool, which every model
has its own pool.
This makes the implementation of the detection and propagation of the
suspendable property simpler and easier to read. More importantly, there are no
more jumps around the AST with the `visit` functions, which in some cases could
result in incorrect visitor context while in the `visit` function. See the added
test, which would cause Verilator to segfault before this patch.
In testing, verilation performance was not shown to be affected by this change.
Though there is a slight performance improvement from this patch, due to adding
one more check before refreshing class member cache.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
* V3Common.cpp::makeVlToString: fix `VL_TOSTRING_W` statement generation to include width argument
* fix contribution name
* add testcase for long struct `VL_TO_STRING_W` bug
`VlNow{}` is completely unnecessary, as coroutines are always on the
heap (unless optimized out). Also fix access of var ref passed to forked processes.
Apart from the representational changes below, this patch renames
AstNodeMath to AstNodeExpr, and AstCMath to AstCExpr.
Now every expression (i.e.: those AstNodes that represent a [possibly
void] value, with value being interpreted in a very general sense) has
AstNodeExpr as a super class. This necessitates the introduction of an
AstStmtExpr, which represents an expression in statement position, e.g :
'foo();' would be represented as AstStmtExpr(AstCCall(foo)). In exchange
we can get rid of isStatement() in AstNodeStmt, which now really always
represent a statement
Peak memory consumption and verilation speed are not measurably changed.
Partial step towards #3420
In non-static contexts like class objects or stack frames, the use of
global trigger evaluation is not feasible. The concept of dynamic
triggers allows for trigger evaluation in such cases. These triggers are
simply local variables, and coroutines are themselves responsible for
evaluating them. They await the global dynamic trigger scheduler object,
which is responsible for resuming them during the trigger evaluation
step in the 'act' eval region. Once the trigger is set, they await the
dynamic trigger scheduler once again, and then get resumed during the
resumption step in the 'act' eval region.
Signed-off-by: Krzysztof Bieganski <kbieganski@antmicro.com>
Also added a testing only -fno-const-before-dfg option, as otherwise
V3Const eats up a lot of the simple inputs. A lot of the things V3Const
swallows in the simple cases can make it to DFG in complex cases, or DFG
itself can create them during optimization. In any case to save
complexity of testing DFG constant folding, we use this option to turn
off V3Const prior to the DFG passes in the relevant test.
Use the same style, and reuse the bulk of astgen to generate DfgVertex
related code. In particular allow for easier definition of custom
DfgVertex sub-types that do not directly correspond to an AstNode
sub-type. Also introduces specific names for the fixed arity vertices.
No functional change intended.
Added a new data-flow graph (DFG) based combinational logic optimizer.
The capabilities of this covers a combination of V3Const and V3Gate, but
is also more capable of transforming combinational logic into simplified
forms and more.
This entail adding a new internal representation, `DfgGraph`, and
appropriate `astToDfg` and `dfgToAst` conversion functions. The graph
represents some of the combinational equations (~continuous assignments)
in a module, and for the duration of the DFG passes, it takes over the
role of AstModule. A bulk of the Dfg vertices represent expressions.
These vertex classes, and the corresponding conversions to/from AST are
mostly auto-generated by astgen, together with a DfgVVisitor that can be
used for dynamic dispatch based on vertex (operation) types.
The resulting combinational logic graph (a `DfgGraph`) is then optimized
in various ways. Currently we perform common sub-expression elimination,
variable inlining, and some specific peephole optimizations, but there
is scope for more optimizations in the future using the same
representation. The optimizer is run directly before and after inlining.
The pre inline pass can operate on smaller graphs and hence converges
faster, but still has a chance of substantially reducing the size of the
logic on some designs, making inlining both faster and less memory
intensive. The post inline pass can then optimize across the inlined
module boundaries. No optimization is performed across a module
boundary.
For debugging purposes, each peephole optimization can be disabled
individually via the -fno-dfg-peepnole-<OPT> option, where <OPT> is one
of the optimizations listed in V3DfgPeephole.h, for example
-fno-dfg-peephole-remove-not-not.
The peephole patterns currently implemented were mostly picked based on
the design that inspired this work, and on that design the optimizations
yields ~30% single threaded speedup, and ~50% speedup on 4 threads. As
you can imagine not having to haul around redundant combinational
networks in the rest of the compilation pipeline also helps with memory
consumption, and up to 30% peak memory usage of Verilator was observed
on the same design.
Gains on other arbitrary designs are smaller (and can be improved by
analyzing those designs). For example OpenTitan gains between 1-15%
speedup depending on build type.
- Rename `--dump-treei` option to `--dumpi-tree`, which itself is now a
special case of `--dumpi-<tag>` where tag can be a magic word, or a
filename
- Control dumping via static `dump*()` functions, analogous to `debug()`
- Make dumping independent of the value of `debug()` (so dumping always
works even without the debug flag)
- Add separate `--dumpi-graph` for dumping V3Graphs, which is again a
special case of `--dumpi-<tag>`
- Alias `--dump-<tag>` to `--dumpi-<tag> 3` as before
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.
Introduce the @astgen directives parsed by astgen, currently used for
the generation child node (operand) accessors. Please see the updated
internal documentation for details.