mirror of
https://github.com/verilator/verilator.git
synced 2025-01-21 22:04:03 +00:00
2247 lines
85 KiB
ReStructuredText
2247 lines
85 KiB
ReStructuredText
|Logo|
|
|
|
|
=====================
|
|
Verilator Internals
|
|
=====================
|
|
|
|
.. contents::
|
|
:depth: 3
|
|
|
|
Introduction
|
|
============
|
|
|
|
This file discusses internal and programming details for Verilator. It's
|
|
a reference for developers and debugging problems.
|
|
|
|
See also the Verilator internals presentation at
|
|
https://www.veripool.org.
|
|
|
|
|
|
Code Flows
|
|
==========
|
|
|
|
|
|
Verilator Flow
|
|
--------------
|
|
|
|
The main flow of Verilator can be followed by reading the Verilator.cpp
|
|
``process()`` function:
|
|
|
|
1. First, the files specified on the command line are read. Reading
|
|
involves preprocessing, then lexical analysis with Flex and parsing
|
|
with Bison. This produces an abstract syntax tree (AST)
|
|
representation of the design, which is what is visible in the .tree
|
|
files described below.
|
|
|
|
2. Verilator then makes a series of passes over the AST, progressively
|
|
refining and optimizing it.
|
|
|
|
3. Cells in the AST first linked, which will read and parse additional
|
|
files as above.
|
|
|
|
4. Functions, variable, and other references are linked to their
|
|
definitions.
|
|
|
|
5. Parameters are resolved, and the design is elaborated.
|
|
|
|
6. Verilator then performs additional edits and optimizations on
|
|
the hierarchical design. This includes coverage, assertions, X
|
|
elimination, inlining, constant propagation, and dead code
|
|
elimination.
|
|
|
|
7. References in the design are then pseudo-flattened. Each module's
|
|
variables and functions get "Scope" references. A scope reference is
|
|
an occurrence of that un-flattened variable in the flattened
|
|
hierarchy. A module that occurs only once in the hierarchy will have
|
|
a single scope and single VarScope for each variable. A module that
|
|
occurs twice will have a scope for each occurrence, and two
|
|
VarScopes for each variable. This allows optimizations to proceed
|
|
across the flattened design while still preserving the hierarchy.
|
|
|
|
8. Additional edits and optimizations proceed on the pseudo-flat
|
|
design. These include module references, function inlining, loop
|
|
unrolling, variable lifetime analysis, lookup table creation, always
|
|
splitting, and logic gate simplifications (pushing inverters, etc.).
|
|
|
|
9. Verilator orders the code. Best case, this results in a single
|
|
"eval" function, which has all always statements flowing from top to
|
|
bottom with no loops.
|
|
|
|
10. Verilator mostly removes the flattening, so that code may be shared
|
|
between multiple invocations of the same module. It localizes
|
|
variables, combines identical functions, expands macros to C
|
|
primitives, adds branch prediction hints, and performs additional
|
|
constant propagation.
|
|
|
|
11. Verilator finally writes the C++ modules.
|
|
|
|
|
|
Key Classes Used in the Verilator Flow
|
|
--------------------------------------
|
|
|
|
|
|
``AstNode``
|
|
~~~~~~~~~~~
|
|
|
|
The AST is represented at the top level by the class ``AstNode``. This
|
|
abstract class has derived classes for the individual components (e.g.
|
|
``AstGenerate`` for a generate block) or groups of components (e.g.
|
|
``AstNodeFTask`` for functions and tasks, which in turn has ``AstFunc`` and
|
|
``AstTask`` as derived classes). An important property of the ``AstNode``
|
|
type hierarchy is that all non-final subclasses of ``AstNode`` (i.e.: those
|
|
which themselves have subclasses) must be abstract as well, and be named
|
|
with the prefix ``AstNode*``. The ``astgen`` (see below) script relies on
|
|
this.
|
|
|
|
Each ``AstNode`` has pointers to up to four children, accessed by the
|
|
``op1p`` through ``op4p`` methods. These methods are then abstracted in a
|
|
specific Ast\* node class to a more specific name. For example, with the
|
|
``AstIf`` node (for ``if`` statements), ``thensp`` calls ``op2p`` to give the
|
|
pointer to the AST for the "then" block, while ``elsesp`` calls ``op3p`` to
|
|
give the pointer to the AST for the "else" block, or NULL if there is not
|
|
one. These accessors are automatically generated by ``astgen`` after
|
|
parsing the ``@astgen`` directives in the specific ``AstNode`` subclasses.
|
|
|
|
``AstNode`` has the concept of a next and previous AST - for example, the
|
|
next and previous statements in a block. Pointers to the AST for these
|
|
statements (if they exist) can be obtained using the ``back`` and ``next``
|
|
methods.
|
|
|
|
It is useful to remember that the derived class ``AstNetlist`` is at the
|
|
top of the tree, so checking for this class is the standard way to see if
|
|
you are at the top of the tree.
|
|
|
|
By convention, each function/method uses the variable ``nodep`` as a
|
|
pointer to the ``AstNode`` currently being processed.
|
|
|
|
There are notable sub-hierarchies of the ``AstNode`` sub-types, namely:
|
|
|
|
1. All AST nodes representing data types derive from ``AstNodeDType``.
|
|
|
|
2. All AST nodes representing expressions (i.e.: anything that stands for,
|
|
or evaluates to a value) derive from ``AstNodeExpr``.
|
|
|
|
|
|
``VNVisitor``
|
|
~~~~~~~~~~~~~
|
|
|
|
The passes are implemented by AST visitor classes. These are implemented by
|
|
subclasses of the abstract class, ``VNVisitor``. Each pass creates an
|
|
instance of the visitor class, which in turn implements a method to perform
|
|
the pass.
|
|
|
|
|
|
``V3Graph``
|
|
~~~~~~~~~~~
|
|
|
|
A number of passes use graph algorithms, and the class ``V3Graph`` is
|
|
provided to represent those graphs. Graphs are directed, and algorithms are
|
|
provided to manipulate the graphs and output them in `GraphViz
|
|
<https://www.graphviz.org>`__ dot format. ``V3Graph.h`` provides
|
|
documentation of this class.
|
|
|
|
|
|
``V3GraphVertex``
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
``V3GraphVertex`` is the base class for vertices in a graph. Vertices have
|
|
an associated ``fanout``, ``color`` and ``rank``, which may be used in
|
|
algorithms for ordering the graph. A generic ``user``/``userp`` member
|
|
variable is also provided.
|
|
|
|
Virtual methods are provided to specify the name, color, shape, and style
|
|
to be used in dot output. Typically users provide derived classes from
|
|
``V3GraphVertex`` which will reimplement these methods.
|
|
|
|
Iterators are provided to access in and out edges. Typically these are used
|
|
in the form:
|
|
|
|
::
|
|
|
|
for (V3GraphEdge *edgep = vertexp->inBeginp();
|
|
edgep;
|
|
edgep = edgep->inNextp()) {
|
|
|
|
|
|
``V3GraphEdge``
|
|
~~~~~~~~~~~~~~~
|
|
|
|
``V3GraphEdge`` is the base class for directed edges between pairs of
|
|
vertices. Edges have an associated ``weight`` and may also be made
|
|
``cutable``. A generic ``user``/``userp`` member variable is also provided.
|
|
|
|
Accessors, ``fromp`` and ``top`` return the "from" and "to" vertices
|
|
respectively.
|
|
|
|
Virtual methods are provided to specify the label, color, and style to be
|
|
used in dot output. Typically users provided derived classes from
|
|
``V3GraphEdge``, which will reimplement these methods.
|
|
|
|
|
|
``V3GraphAlg``
|
|
~~~~~~~~~~~~~~
|
|
|
|
This is the base class for graph algorithms. It implements a ``bool``
|
|
method, ``followEdge`` which algorithms can use to decide whether an edge
|
|
is followed. This method returns true if the graph edge has a weight greater
|
|
than one and a user function, ``edgeFuncp`` (supplied in the constructor)
|
|
returns ``true``.
|
|
|
|
A number of predefined derived algorithm classes and access methods are
|
|
provided and documented in ``V3GraphAlg.cpp``.
|
|
|
|
|
|
``DfgGraph``
|
|
~~~~~~~~~~~~
|
|
|
|
The data-flow graph-based combinational logic optimizer (DFG optimizer)
|
|
converts an ``AstModule`` into a ``DfgGraph``. The graph represents the
|
|
combinational equations (~continuous assignments) in the module, and for the
|
|
duration of the DFG passes, it takes over the role of the represented
|
|
``AstModule``. The ``DfgGraph`` keeps hold of the represented ``AstModule``,
|
|
and the ``AstModule`` retains all other logic that is not representable as a
|
|
data-flow graph. At the end of optimization, the combinational logic
|
|
represented by the ``DfgGraph`` is converted back into AST form and is
|
|
re-inserted into the corresponding ``AstModule``. The ``DfgGraph`` is distinct
|
|
from ``V3Graph`` for efficiency and other desirable properties which make
|
|
writing DFG passes easier.
|
|
|
|
|
|
``DfgVertex``
|
|
~~~~~~~~~~~~~
|
|
|
|
The ``DfgGraph`` represents combinational logic equations as a graph of
|
|
``DfgVertex`` vertices. Each sub-class of ``DfgVertex`` corresponds to an
|
|
expression (a sub-class of ``AstNodeExpr``), a constant, or a variable
|
|
reference. LValues and RValues referencing the same storage location are
|
|
represented by the same ``DfgVertex``. Consumers of such vertices read as the
|
|
LValue, writers of such vertices write the RValue. The bulk of the final
|
|
``DfgVertex`` sub-classes are generated by ``astgen`` from the corresponding
|
|
``AstNode`` definitions.
|
|
|
|
|
|
Scheduling
|
|
----------
|
|
|
|
Verilator implements the Active and NBA regions of the SystemVerilog scheduling
|
|
model as described in IEEE 1800-2017 chapter 4, and in particular sections
|
|
4.5 and Figure 4.1. The static (Verilation time) scheduling of SystemVerilog
|
|
processes is performed by code in the ``V3Sched`` namespace. The single
|
|
entry point to the scheduling algorithm is ``V3Sched::schedule``. Some
|
|
preparatory transformations important for scheduling are also performed in
|
|
``V3Active`` and ``V3ActiveTop``. High-level evaluation functions are
|
|
constructed by ``V3Order``, which ``V3Sched`` invokes on subsets of the logic
|
|
in the design.
|
|
|
|
Scheduling deals with the problem of evaluating 'logic' in the correct order
|
|
and the correct number of times in order to compute the correct state of the
|
|
SystemVerilog program. Throughout this section, we use the term 'logic' to
|
|
refer to all SystemVerilog constructs that describe the evolution of the state
|
|
of the program. In particular, all SystemVerilog processes and continuous
|
|
assignments are considered 'logic', but not for example variable definitions
|
|
without initialization or other miscellaneous constructs.
|
|
|
|
|
|
Classes of logic
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
The first step in the scheduling algorithm is to gather all the logic present
|
|
in the design, and classify it based on the conditions under which the logic
|
|
needs to be evaluated.
|
|
|
|
The classes of logic we distinguish between are:
|
|
|
|
- SystemVerilog ``initial`` processes, that need to be executed once at
|
|
startup.
|
|
|
|
- Static variable initializers. These are a separate class as they need to be
|
|
executed before ``initial`` processes.
|
|
|
|
- SystemVerilog ``final`` processes.
|
|
|
|
- Combinational logic. Any process or construct that has an implicit
|
|
sensitivity list with no explicit sensitivities is considered 'combinational'
|
|
logic. This includes among other things, ``always @*`` and ``always_comb``
|
|
processes, and continuous assignments. Verilator also converts some other
|
|
``always`` processes to combinational logic in ``V3Active`` as described
|
|
below.
|
|
|
|
- Clocked logic. Any process or construct that has an explicit sensitivity
|
|
list, with no implicit sensitivities, is considered 'clocked' (or
|
|
'sequential') logic. This includes, among other things ``always`` and
|
|
``always_ff`` processes with an explicit sensitivity list.
|
|
|
|
Note that the distinction between clocked logic and combinational logic is only
|
|
important for the scheduling algorithm within Verilator as we handle the two
|
|
classes differently. It is possible to convert clocked logic into combinational
|
|
logic if the explicit sensitivity list of the clocked logic is the same as the
|
|
implicit sensitivity list of the equivalent combinational logic would be. The
|
|
canonical examples are: ``always @(a) x = a;``, which is considered to be
|
|
clocked logic by Verilator, and the equivalent ``assign x = a;``, which is
|
|
considered to be combinational logic. ``V3Active`` in fact converts all clocked
|
|
logic to combinational logic whenever possible, as this provides advantages for
|
|
scheduling as described below.
|
|
|
|
There is also a 'hybrid' logic class, which has both explicit and implicit
|
|
sensitivities. This kind of logic does not arise from a SystemVerilog
|
|
construct, but is created during scheduling to break combinational cycles.
|
|
Details of this process and the hybrid logic class are described below.
|
|
|
|
|
|
Scheduling of simple classes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
SystemVerilog ``initial`` and ``final`` blocks can be scheduled (executed) in an
|
|
arbitrary order.
|
|
|
|
Static variable initializers need to be executed in source code order in case
|
|
there is a dependency between initializers, but the ordering of static variable
|
|
initialization is otherwise not defined by the SystemVerilog standard
|
|
(particularly, in the presence of hierarchical references in static variable
|
|
initializers).
|
|
|
|
The scheduling algorithm handles all three of these classes the same way and
|
|
schedules the logic in these classes in source code order. This step yields the
|
|
``_eval_static``, ``_eval_initial`` and ``_eval_final`` functions which execute
|
|
the corresponding logic constructs.
|
|
|
|
|
|
Scheduling of clocked and combinational logic
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
For performance, clocked and combinational logic needs to be ordered.
|
|
Conceptually this minimizes the iterations through the evaluation loop
|
|
presented in the reference algorithm in the SystemVerilog standard (IEEE
|
|
1800-2017 section 4.5), by evaluating logic constructs in data-flow order.
|
|
Without going into a lot of detail here, accept that well thought out ordering
|
|
is crucial to good simulation performance, and also enables further
|
|
optimizations later on.
|
|
|
|
At the highest level, ordering is performed by ``V3Order::order``, which is
|
|
invoked by ``V3Sched::schedule`` on various subsets of the combinational and
|
|
clocked logic as described below. The important thing to highlight now is that
|
|
``V3Order::order`` operates by assuming that the state of all variables driven
|
|
by combinational logic is consistent with that combinational logic. While this
|
|
might seem subtle, it is very important, so here is an example:
|
|
|
|
::
|
|
always_comb d = q + 2;
|
|
always @(posedge clock) q <= d;
|
|
|
|
|
|
During ordering, ``V3Order`` will assume that ``d`` equals ``q + 2`` at the
|
|
beginning of an evaluation step. As a result it will order the clocked logic
|
|
first, and all downstream combinational logic (like the assignment to ``d``)
|
|
will execute after the clocked logic that drives inputs to the combinational
|
|
logic, in data-flow (or dependency) order. At the end of the evaluation step,
|
|
this ordering restores the invariant that variables driven by combinational
|
|
logic are consistent with that combinational logic (i.e., the circuit is in a
|
|
settled/steady state).
|
|
|
|
One of the most important optimizations for performance is to only evaluate
|
|
combinational logic, if its inputs might have changed. For example, there is no
|
|
point in evaluating the above assignment to ``d`` on a negative edge of the
|
|
clock signal. Verilator does this by pushing the combinational logic into the
|
|
same (possibly multiple) event domains as the logic driving the inputs to that
|
|
combinational logic, and only evaluating the combinational logic if at least
|
|
one driving domain has been triggered. The impact of this activity gating is
|
|
very high (observed 100x slowdown on large designs when turning it off), it is
|
|
the reason we prefer to convert clocked logic to combinational logic in
|
|
``V3Active`` whenever possible.
|
|
|
|
The ordering procedure described above works straightforward unless there are
|
|
combinational logic constructs that are circularly dependent (a.k.a.: the
|
|
UNOPTFLAT warning). Combinational scheduling loops can arise in sound
|
|
(realizable) circuits as Verilator considers each SystemVerilog process as a
|
|
unit of scheduling (albeit we do try to split processes into smaller ones to
|
|
avoid this circularity problem whenever possible, this is not always possible).
|
|
|
|
|
|
Breaking combinational loops
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Combinational loops are broken by the introduction of instances of the 'hybrid'
|
|
logic class. As described in the previous section, combinational loops require
|
|
iteration until the logic is settled, in order to restore the invariant that
|
|
combinationally driven signals are consistent with the combinational logic.
|
|
|
|
To achieve this, ``V3Sched::schedule`` calls ``V3Sched::breakCycles``, which
|
|
builds a dependency graph of all combinational logic in the design, and then
|
|
breaks all combinational cycles by converting all combinational logic that
|
|
consumes a variable driven via a 'back-edge' into hybrid logic. Here
|
|
'back-edge' just means a graph edge that points from a higher-rank vertex to a
|
|
lower rank vertex in some consistent ranking of the directed graph. Variables
|
|
driven via a back-edge in the dependency graph are marked, and all
|
|
combinational logic that depends on such variables is converted into hybrid
|
|
logic, with the back-edge driven variables listed as explicit 'changed'
|
|
sensitivities.
|
|
|
|
Hybrid logic is handled by ``V3Order`` mostly in the same way as combinational
|
|
logic, with two exceptions:
|
|
|
|
- Explicit sensitivities of hybrid logic are ignored for the purposes of
|
|
data-flow ordering with respect to other combinational or hybrid logic. I.e.:
|
|
an explicit sensitivity suppresses the implicit sensitivity on the same
|
|
variable. This could also be interpreted as ordering the hybrid logic as if
|
|
all variables listed as explicit sensitivities were substituted as constants
|
|
with their current values.
|
|
|
|
- The explicit sensitivities are included as an additional driving domain of
|
|
the logic, and also cause evaluation when triggered.
|
|
|
|
This means that hybrid logic is evaluated when either any of its implicit
|
|
sensitivities might have been updated (the same way as combinational logic, by
|
|
pushing it into the domains that write those variables), or if any of its
|
|
explicit sensitivities are triggered.
|
|
|
|
The effect of this transformation is that ``V3Order`` can proceed as if there
|
|
are no combinational cycles (or alternatively, under the assumption that the
|
|
back-edge-driven variables don't change during one evaluation pass). The
|
|
evaluation loop invoking the ordered code, will then re-invoke it on a follow
|
|
on iteration, if any of the explicit sensitivities of hybrid logic have
|
|
actually changed due to the previous invocation, iterating until all the
|
|
combinational (including hybrid) logic have settled.
|
|
|
|
One might wonder if there can be a race condition between clocked logic
|
|
triggered due to a combinational signal change from the previous evaluation
|
|
pass, and a combinational loop settling due to hybrid logic, if the clocked
|
|
logic reads the not yet settled combinationally driven signal. Such a race
|
|
is indeed possible, but our evaluation is consistent with the SystemVerilog
|
|
scheduling semantics (IEEE 1800-2017 chapter 4), and therefore any program
|
|
that exhibits such a race has non-deterministic behavior according to the
|
|
SystemVerilog semantics, so we accept this.
|
|
|
|
|
|
Settling combinational logic after initialization
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
At the beginning of simulation, once static initializer and ``initial`` blocks
|
|
have been executed, we need to evaluate all combinational logic, in order to
|
|
restore the invariant utilized by ``V3Order`` that the state of all
|
|
combinationally driven variables are consistent with the combinational logic.
|
|
|
|
To achieve this, we invoke ``V3Order::order`` on all of the combinational and
|
|
hybrid logic, and iterate the resulting evaluation function until no more
|
|
hybrid logic is triggered. This yields the `_eval_settle` function, which is
|
|
invoked at the beginning of simulation after the `_eval_initial`.
|
|
|
|
|
|
Partitioning logic for correct NBA updates
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
``V3Order`` can order logic corresponding to non-blocking assignments (NBAs) to
|
|
yield correct simulation results, as long as all the sensitivity expressions of
|
|
clocked logic triggered in the Active scheduling region of the current time
|
|
step are known up front. I.e., the ordering of NBA updates is only correct if
|
|
derived clocks that are computed in an Active region update (that is, via a
|
|
blocking or continuous assignment) are known up front.
|
|
|
|
We can ensure this by partitioning the logic into two regions. Note these
|
|
regions are a concept of the Verilator scheduling algorithm, and they do not
|
|
directly correspond to the similarly named SystemVerilog scheduling regions
|
|
as defined in the standard:
|
|
|
|
- All logic (clocked, combinational and hybrid) that transitively feeds into,
|
|
or drives via a non-blocking or continuous assignments (or via any update
|
|
that SystemVerilog executes in the Active scheduling region), a variable that
|
|
is used in the explicit sensitivity list of some clocked or hybrid logic, is
|
|
assigned to the 'act' region.
|
|
|
|
- All other logic is assigned to the 'nba' region.
|
|
|
|
For completeness, note that a subset of the 'act' region logic, specifically,
|
|
the logic related to the pre-assignments of NBA updates (i.e., AstAssignPre
|
|
nodes), is handled separately, but is executed as part of the 'act' region.
|
|
|
|
Also note that all logic representing the committing of an NBA (i.e., Ast*Post)
|
|
nodes) will be in the 'nba' region. This means that the evaluation of the 'act'
|
|
region logic will not commit any NBA updates. As a result, the 'act' region
|
|
logic can be iterated to compute all derived clock signals up front.
|
|
|
|
The correspondence between the SystemVerilog Active and NBA scheduling regions,
|
|
and the internal 'act' and 'nba' regions, is that 'act' contains all Active
|
|
region logic that can compute a clock signal, while 'nba' contains all other
|
|
Active and NBA region logic. For example, if the only clocks in the design are
|
|
top-level inputs, then 'act' will be empty, and 'nba' will contain the whole of
|
|
the design.
|
|
|
|
The partitioning described above is performed by ``V3Sched::partition``.
|
|
|
|
|
|
Replication of combinational logic
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
We will separately invoke ``V3Order::order`` on the 'act' and 'nba' region
|
|
logic.
|
|
|
|
Combinational logic that reads variables driven from both 'act' and 'nba'
|
|
region logic has the problem of needing to be reevaluated even if only one of
|
|
the regions updates an input variable. We could pass additional trigger
|
|
expressions between the regions to make sure combinational logic is always
|
|
reevaluated, or we can replicate combinational logic that is driven from
|
|
multiple regions, by copying it into each region that drives it. Experiments
|
|
show this simple replication works well performance-wise (and notably
|
|
``V3Combine`` is good at combining the replicated code), so this is what we do
|
|
in ``V3Sched::replicateLogic``.
|
|
|
|
In ``V3Sched::replicateLogic``, in addition to replicating logic into the 'act'
|
|
and 'nba' regions, we also replicate combinational (and hybrid) logic that
|
|
depends on top level inputs. These become a separate 'ico' region (Input
|
|
Combinational logic), which we will always evaluate at the beginning of a
|
|
time-step to ensure the combinational invariant holds even if input signals
|
|
have changed. Note that this eliminates the need of changing data and clock
|
|
signals on separate evaluations, as was necessary with earlier versions of
|
|
Verilator).
|
|
|
|
|
|
Constructing the top level `_eval` function
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To construct the top level `_eval` function, which updates the state of the
|
|
circuit to the end of the current time step, we invoke ``V3Order::order``
|
|
separately on the 'ico', 'act' and 'nba' logic, which yields the `_eval_ico`,
|
|
`_eval_act`, and `_eval_nba` functions. We then put these all together with the
|
|
corresponding functions that compute the respective trigger expressions into
|
|
the top level `_eval` function, which on the high level has the form:
|
|
|
|
::
|
|
|
|
void _eval() {
|
|
// Update combinational logic dependent on top level inputs ('ico' region)
|
|
while (true) {
|
|
_eval__triggers__ico();
|
|
// If no 'ico' region trigger is active
|
|
if (!ico_triggers.any()) break;
|
|
_eval_ico();
|
|
}
|
|
|
|
// Iterate 'act' and 'nba' regions together
|
|
while (true) {
|
|
|
|
// Iterate 'act' region, this computes all derived clocks updaed in the
|
|
// Active scheduling region, but does not commit any NBAs that executed
|
|
// in 'act' region logic.
|
|
while (true) {
|
|
_eval__triggers__act();
|
|
// If no 'act' region trigger is active
|
|
if (!act_triggers.any()) break;
|
|
// Remember what 'act' triggers were active, 'nba' uses the same
|
|
latch_act_triggers_for_nba();
|
|
_eval_act();
|
|
}
|
|
|
|
// If no 'nba' region trigger is active
|
|
if (!nba_triggers.any()) break;
|
|
|
|
// Evaluate all other Active region logic, and commit NBAs
|
|
_eval_nba();
|
|
}
|
|
}
|
|
|
|
|
|
Timing
|
|
------
|
|
|
|
Timing support in Verilator utilizes C++ coroutines, which is a new feature in
|
|
C++20. The basic idea is to represent processes and tasks that await a certain
|
|
event or simulation time as coroutines. These coroutines get suspended at the
|
|
await, and resumed whenever the triggering event occurs, or at the expected
|
|
simulation time.
|
|
|
|
There are several runtime classes used for managing such coroutines defined in
|
|
``verilated_timing.h`` and ``verilated_timing.cpp``.
|
|
|
|
``VlCoroutineHandle``
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
A thin wrapper around an ``std::coroutine_handle<>``. It forces move semantics,
|
|
destroys the coroutine if it remains suspended at the end of the design's
|
|
lifetime, and prevents multiple ``resume`` calls in the case of
|
|
``fork..join_any``.
|
|
|
|
``VlCoroutine``
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Return value of all coroutines. Together with the promise type contained
|
|
within, it allows for chaining coroutines - resuming coroutines from up the
|
|
call stack. The calling coroutine's handle is saved in the promise object as a
|
|
continuation, that is, the coroutine that must be resumed after the promise's
|
|
coroutine finishes. This is necessary as C++ coroutines are stackless, meaning
|
|
each one is suspended independently of others in the call graph.
|
|
|
|
``VlDelayScheduler``
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This class manages processes suspended by delays. There is one instance of this
|
|
class per design. Coroutines ``co_await`` this object's ``delay`` function.
|
|
Internally, they are stored in a heap structure sorted by simulation time in
|
|
ascending order. When ``resume`` is called on the delay scheduler, all
|
|
coroutines awaiting the current simulation time are resumed. The current
|
|
simulation time is retrieved from a ``VerilatedContext`` object.
|
|
|
|
``VlTriggerScheduler``
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
This class manages processes that await events (triggers). There is one such
|
|
object per each trigger awaited by coroutines. Coroutines ``co_await`` this
|
|
object's ``trigger`` function. They are stored in two stages - `uncommitted`
|
|
and `ready`. First, they land in the `uncommitted` stage, and cannot be
|
|
resumed. The ``resume`` function resumes all coroutines from the `ready` stage
|
|
and moves `uncommitted` coroutines into `ready`. The ``commit`` function only
|
|
moves `uncommitted` coroutines into `ready`.
|
|
|
|
This split is done to avoid self-triggering and triggering coroutines multiple
|
|
times. See the `Scheduling with timing` section for details on how this is
|
|
used.
|
|
|
|
``VlDynamicTriggerScheduler``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Like ``VlTriggerScheduler``, ``VlDynamicTriggerScheduler`` manages processes
|
|
that await triggers. However, it does not rely on triggers evaluated externally
|
|
by the 'act' trigger eval function. Instead, it is also responsible for trigger
|
|
evaluation. Coroutines that make use of this scheduler must adhere to a certain
|
|
procedure:
|
|
|
|
::
|
|
|
|
__Vtrigger = 0;
|
|
<locals and inits required for trigger eval>
|
|
while (!__Vtrigger) {
|
|
co_await __VdynSched.evaluation();
|
|
<pre updates>;
|
|
__Vtrigger = <trigger eval>;
|
|
[optionally] co_await __VdynSched.postUpdate();
|
|
<post updates>;
|
|
}
|
|
co_await __VdynSched.resumption();
|
|
|
|
The coroutines get resumed at trigger evaluation time, evaluate their local
|
|
triggers, optionally await the post update step, and if the trigger is set,
|
|
await proper resumption in the 'act' eval step.
|
|
|
|
``VlForkSync``
|
|
~~~~~~~~~~~~~~
|
|
|
|
Used for synchronizing ``fork..join`` and ``fork..join_any``. Forking
|
|
coroutines ``co_await`` its ``join`` function, and forked ones call ``done``
|
|
when they're finished. Once the required number of coroutines (set using
|
|
``setCounter``) finish execution, the forking coroutine is resumed.
|
|
|
|
``VlForever``
|
|
~~~~~~~~~~~~~
|
|
|
|
A small utility awaitable type. It allows for blocking a coroutine forever. It
|
|
is currently only used for ``wait`` statements that await a constant false
|
|
condition. See the `Timing Pass` section for more details.
|
|
|
|
Timing Pass
|
|
~~~~~~~~~~~
|
|
|
|
There are two visitors in ``V3Timing.cpp``.
|
|
|
|
The first one, ``TimingSuspendableVisitor``, does not perform any AST
|
|
transformations. It is responsible for marking processes and C++ functions that
|
|
contain timing controls as suspendable. Processes that call suspendable
|
|
functions are also marked as suspendable. Functions that call, are overridden
|
|
by, or override suspendable functions are marked as suspendable as well.
|
|
|
|
The visitor keeps a dependency graph of functions and processes to handle such
|
|
cases. A function or process is dependent on a function if it calls it. A
|
|
virtual class method is dependent on another class method if it calls it,
|
|
overrides it, or is overriden by it.
|
|
|
|
The second visitor in ``V3Timing.cpp``, ``TimingControlVisitor``, uses the
|
|
information provided by ``TimingSuspendableVisitor`` and transforms each timing
|
|
control into a ``co_await``.
|
|
|
|
* event controls are turned into ``co_await`` on a trigger scheduler's
|
|
``trigger`` method. The awaited trigger scheduler is the one corresponding to
|
|
the sentree referenced by the event control. This sentree is also referenced
|
|
by the ``AstCAwait`` node, to be used later by the static scheduling code.
|
|
* if an event control waits on a local variable or class member, it uses a
|
|
local trigger which it evaluates inline. It awaits a dynamic trigger
|
|
scheduler multiple times: for trigger evaluation, updates, and resumption.
|
|
The dynamic trigger scheduler is responsible for resuming the coroutine at
|
|
the correct point of evaluation.
|
|
* delays are turned into ``co_await`` on a delay scheduler's ``delay`` method.
|
|
The created ``AstCAwait`` nodes also reference a special sentree related to
|
|
delays, to be used later by the static scheduling code.
|
|
* ``join`` and ``join_any`` are turned into ``co_await`` on a ``VlForkSync``'s
|
|
``join`` method. Each forked process gets a ``VlForkSync::done`` call at the
|
|
end.
|
|
|
|
Assignments with intra-assignment timing controls are simplified into
|
|
assignments after those timing controls, with the LHS and RHS values evaluated
|
|
before them and stored in temporary variables.
|
|
|
|
``wait`` statements are transformed into while loops that check the condition
|
|
and then await changes in variables used in the condition. If the condition is
|
|
always false, the ``wait`` statement is replaced by a ``co_await`` on a
|
|
``VlForever``. This is done instead of a return in case the ``wait`` is deep in
|
|
a call stack (otherwise, the coroutine's caller would continue execution).
|
|
|
|
Each sub-statement of a ``fork`` is put in an ``AstBegin`` node for easier
|
|
grouping. In a later step, each of these gets transformed into a new, separate
|
|
function. See the `Forks` section for more detail.
|
|
|
|
Suspendable functions get the return type of ``VlCoroutine``, which makes them
|
|
coroutines. Later, during ``V3Sched``, suspendable processes are also
|
|
transformed into coroutines.
|
|
|
|
Scheduling with timing
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Timing features in Verilator are built on top of the static scheduler. Triggers
|
|
are used for determining which delay or trigger schedulers should resume. A
|
|
special trigger is used for the delay scheduler. This trigger is set if there
|
|
are any coroutines awaiting the current simulation time
|
|
(``VlDelayScheduler::awaitingCurrentTime()``).
|
|
|
|
All triggers used by a suspendable process are mapped to variables written in
|
|
that process. When ordering code using ``V3Order``, these triggers are provided
|
|
as external domains of these variables. This ensures that the necessary
|
|
combinational logic is triggered after a coroutine resumption.
|
|
|
|
There are two functions for managing timing logic called by ``_eval()``:
|
|
|
|
* ``_timing_commit()``, which commits all coroutines whose triggers were not set
|
|
in the current iteration,
|
|
* ``_timing_resume()``, which calls `resume()` on all trigger and delay
|
|
schedulers whose triggers were set in the current iteration.
|
|
|
|
Thanks to this separation, a coroutine awaiting a trigger cannot be suspended
|
|
and resumed in the same iteration, and it cannot be resumed before it suspends.
|
|
|
|
All coroutines are committed and resumed in the 'act' eval loop. With timing
|
|
features enabled, the ``_eval()`` function takes this form:
|
|
|
|
::
|
|
|
|
void _eval() {
|
|
while (true) {
|
|
_eval__triggers__ico();
|
|
if (!ico_triggers.any()) break;
|
|
_eval_ico();
|
|
}
|
|
|
|
while (true) {
|
|
while (true) {
|
|
_eval__triggers__act();
|
|
|
|
// Commit all non-triggered coroutines
|
|
_timing_commit();
|
|
|
|
if (!act_triggers.any()) break;
|
|
latch_act_triggers_for_nba();
|
|
|
|
// Resume all triggered coroutines
|
|
_timing_resume();
|
|
|
|
_eval_act();
|
|
}
|
|
if (!nba_triggers.any()) break;
|
|
_eval_nba();
|
|
}
|
|
}
|
|
|
|
Forks
|
|
~~~~~
|
|
|
|
After the scheduling step, forks sub-statements are transformed into separate
|
|
functions, and these functions are called in place of the sub-statements. These
|
|
calls must be without ``co_await``, so that suspension of a forked process
|
|
doesn't suspend the forking process.
|
|
|
|
In forked processes, references to local variables are only allowed in
|
|
``fork..join``, as this is the only case that ensures the lifetime of these
|
|
locals are at least as long as the execution of the forked processes.
|
|
|
|
|
|
Multithreaded Mode
|
|
------------------
|
|
|
|
In ``--threads`` mode, the frontend of the Verilator pipeline is the same
|
|
as serial mode, up until V3Order.
|
|
|
|
``V3Order`` builds a fine-grained, statement-level dependency graph that
|
|
governs the ordering of code within a single ``eval()`` call. In serial
|
|
mode, that dependency graph is used to order all statements into a total
|
|
serial order. In parallel mode, the same dependency graph is the starting
|
|
point for a partitioner (``V3Partition``).
|
|
|
|
The partitioner's goal is to coarsen the fine-grained graph into a coarser
|
|
graph, while maintaining as much available parallelism as possible. Often
|
|
the partitioner can transform an input graph with millions of nodes into a
|
|
coarsened execution graph with a few dozen nodes, while maintaining enough
|
|
parallelism to take advantage of a modern multicore CPU. Runtime
|
|
synchronization cost is reasonable with so few nodes.
|
|
|
|
|
|
Partitioning
|
|
~~~~~~~~~~~~
|
|
|
|
Our partitioner is similar to the one Vivek Sarkar described in his 1989
|
|
paper *Partitioning and Scheduling Parallel Programs for Multiprocessors*.
|
|
|
|
Let's define some terms:
|
|
|
|
|
|
Par Factor
|
|
~~~~~~~~~~
|
|
|
|
The available parallelism or "par-factor" of a DAG is the total cost to
|
|
execute all nodes, divided by the cost to execute the longest critical path
|
|
through the graph. This is the speedup you would get from running the graph
|
|
in parallel, if given infinite CPU cores available and communication and
|
|
synchronization is zero.
|
|
|
|
|
|
Macro Task
|
|
~~~~~~~~~~
|
|
|
|
When the partitioner coarsens the graph, it combines nodes together. Each
|
|
fine-grained node represents an atomic "task"; combined nodes in the
|
|
coarsened graph are "macro-tasks". This term comes from Sarkar. Each
|
|
macro-task executes from start to end on one processor, without any
|
|
synchronization to any other macro-task during its execution.
|
|
(Synchronization only happens before the macro-task begins or after it
|
|
ends.)
|
|
|
|
|
|
Edge Contraction
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Verilator's partitioner, like Sarkar's, primarily relies on "edge
|
|
contraction" to coarsen the graph. It starts with one macro-task per atomic
|
|
task and iteratively combines pairs of edge-connected macro-tasks.
|
|
|
|
|
|
Local Critical Path
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
Each node in the graph has a "local" critical path. That's the critical
|
|
path from the start of the graph to the start of the node, plus the node's
|
|
cost, plus the critical path from the end of the node to the end of the
|
|
graph.
|
|
|
|
Sarkar calls out an important trade-off: coarsening the graph reduces
|
|
runtime synchronization overhead among the macro-tasks, but it tends to
|
|
increase the critical path through the graph and thus reduces par-factor.
|
|
|
|
Sarkar's partitioner, and ours, chooses pairs of macro-tasks to merge such
|
|
that the growth in critical path is minimized. Each candidate merge would
|
|
result in a new node, which would have some local critical path. We choose
|
|
the candidate that would produce the shortest local critical path. Repeat
|
|
until par-factor falls to a target threshold. It's a greedy algorithm, and
|
|
it's not guaranteed to produce the best partition (which Sarkar proves is
|
|
NP-hard).
|
|
|
|
|
|
Estimating Logic Costs
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To compute the cost of any given path through the graph, Verilator
|
|
estimates an execution cost for each task. Each macro-task has an execution
|
|
cost which is the sum of its tasks' costs. We assume that communication
|
|
overhead and synchronization overhead are zero, so the cost of any given
|
|
path through the graph is the sum of macro-task execution costs. Sarkar
|
|
does almost the same thing, except that he has nonzero estimates for
|
|
synchronization costs.
|
|
|
|
Verilator's cost estimates are assigned by ``InstrCountVisitor``. This
|
|
class is perhaps the most fragile piece of the multithread
|
|
implementation. It's easy to have a bug where you count something cheap
|
|
(e.g. accessing one element of a huge array) as if it were expensive (eg.
|
|
by counting it as if it were an access to the entire array.) Even without
|
|
such gross bugs, the estimates this produce are only loosely predictive of
|
|
actual runtime cost. Multithread performance would be better with better
|
|
runtime costs estimates. This is an area to improve.
|
|
|
|
|
|
Scheduling Macro-Tasks at Runtime
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
After coarsening the graph, we must schedule the macro-tasks for
|
|
runtime. Sarkar describes two options: you can dynamically schedule tasks
|
|
at runtime, with a runtime graph follower. Sarkar calls this the
|
|
"macro-dataflow model." Verilator does not support this; early experiments
|
|
with this approach had poor performance.
|
|
|
|
The other option is to statically assign macro-tasks to threads, with each
|
|
thread running its macro-tasks in a static order. Sarkar describes this in
|
|
Chapter 5. Verilator takes this static approach. The only dynamic aspect is
|
|
that each macro task may block before starting, to wait until its
|
|
prerequisites on other threads have finished.
|
|
|
|
The synchronization cost is cheap if the prereqs are done. If they're not,
|
|
fragmentation (idle CPU cores waiting) is possible. This is the major
|
|
source of overhead in this approach. The ``--prof-exec`` switch and the
|
|
``verilator_gantt`` script can visualize the time lost to such
|
|
fragmentation.
|
|
|
|
|
|
Locating Variables for Best Spatial Locality
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
After scheduling all code, we attempt to locate variables in memory, such
|
|
that variables accessed by a single macro-task are close together in
|
|
memory. This provides "spatial locality" - when we pull in a 64-byte cache
|
|
line to access a 2-byte variable, we want the other 62 bytes to be ones
|
|
we'll also likely access soon, for best cache performance.
|
|
|
|
This is critical for performance. It should allow Verilator
|
|
to scale to very large models. We don't rely on our working set fitting
|
|
in any CPU cache; instead we essentially "stream" data into caches from
|
|
memory. It's not literally streaming, where the address increases
|
|
monotonically, but it should have similar performance characteristics,
|
|
so long as each macro-task's dataset fits in one core's local caches.
|
|
|
|
To achieve spatial locality, we tag each variable with the set of
|
|
macro-tasks that access it. Let's call this set the "footprint" of that
|
|
variable. The variables in a given module have a set of footprints. We
|
|
can order those footprints to minimize the distance between them
|
|
(distance is the number of macro-tasks that are different across any two
|
|
footprints) and then emit all variables into the struct in
|
|
ordered-footprint order.
|
|
|
|
The footprint ordering is literally the traveling salesman problem, and
|
|
we use a TSP-approximation algorithm to get close to an optimal sort.
|
|
|
|
This is an old idea. Simulators designed at DEC in the early 1990s used
|
|
similar techniques to optimize both single-thread and multithread
|
|
modes. (Verilator does not optimize variable placement for spatial
|
|
locality in serial mode; that is a possible area for improvement.)
|
|
|
|
|
|
Improving Multithreaded Performance Further (a TODO list)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Wave Scheduling
|
|
+++++++++++++++
|
|
|
|
To allow the Verilated model to run in parallel with the testbench, it
|
|
might be nice to support "wave" scheduling, in which work on a cycle begins
|
|
before ``eval()`` is called or continues after ``eval()`` returns. For now,
|
|
all work on a cycle happens during the ``eval()`` call, leaving Verilator's
|
|
threads idle while the testbench (everything outside ``eval()``) is
|
|
working. This would involve fundamental changes within the partitioner,
|
|
however, it's probably the best bet for hiding testbench latency.
|
|
|
|
|
|
Efficient Dynamic Scheduling
|
|
++++++++++++++++++++++++++++
|
|
|
|
To scale to more than a few threads, we may revisit a fully dynamic
|
|
scheduler. For large (>16 core) systems, it might make sense to dedicate an
|
|
entire core to scheduling, so that scheduler data structures would fit in
|
|
its L1 cache and thus the cost of traversing priority-ordered ready lists
|
|
would not be prohibitive.
|
|
|
|
|
|
Static Scheduling with Runtime Repack
|
|
+++++++++++++++++++++++++++++++++++++
|
|
|
|
We could modify the static scheduling approach by gathering actual
|
|
macro-task execution times at run time, and dynamically re-packing the
|
|
macro-tasks into the threads also at run time. Say, re-pack once every
|
|
10,000 cycles or something. This has the potential to do better than our
|
|
static estimates about macro-task run times. It could potentially react to
|
|
CPU cores that aren't performing equally, due to NUMA or thermal throttling
|
|
or nonuniform competing memory traffic or whatever.
|
|
|
|
|
|
Clock Domain Balancing
|
|
++++++++++++++++++++++
|
|
|
|
Right now Verilator makes no attempt to balance clock domains across
|
|
macro-tasks. For a multi-domain model, that could lead to bad gantt chart
|
|
fragmentation. This could be improved if it's a real problem in practice.
|
|
|
|
|
|
Other Forms of MTask Balancing
|
|
++++++++++++++++++++++++++++++
|
|
|
|
The largest source of runtime overhead is idle CPUs, which happens due to
|
|
variance between our predicted runtime for each MTask and its actual
|
|
runtime. That variance is magnified if MTasks are homogeneous, containing
|
|
similar repeating logic which was generally close together in source code
|
|
and which is still packed together even after going through Verilator's
|
|
digestive tract.
|
|
|
|
If Verilator could avoid doing that, and instead would take source logic
|
|
that was close together and distribute it across MTasks, that would
|
|
increase the diversity of any given MTask, and this should reduce variance
|
|
in the cost estimates.
|
|
|
|
One way to do that might be to make various "tie breaker" comparison
|
|
routines in the sources to rely more heavily on randomness, and
|
|
generally try harder not to keep input nodes together when we have the
|
|
option to scramble things.
|
|
|
|
Profile-guided optimization make this a bit better, by adjusting mtask
|
|
scheduling, but this does not yet guide the packing into mtasks.
|
|
|
|
|
|
Performance Regression
|
|
++++++++++++++++++++++
|
|
|
|
It would be nice if we had a regression of large designs, with some
|
|
diversity of design styles, to test on both single- and multithreaded
|
|
modes. This would help to avoid performance regressions, and also to
|
|
evaluate the optimizations while minimizing the impact of parasitic noise.
|
|
|
|
|
|
Per-Instance Classes
|
|
++++++++++++++++++++
|
|
|
|
If we have multiple instances of the same module, and they partition
|
|
differently (likely; we make no attempt to partition them the same), then
|
|
the variable sort will be suboptimal for either instance. A possible
|
|
improvement would be to emit an unique class for each instance of a module,
|
|
and sort its variables optimally for that instance's code stream.
|
|
|
|
|
|
Verilated Flow
|
|
--------------
|
|
|
|
The evaluation loop outputted by Verilator is designed to allow a single
|
|
function to perform evaluation under most situations.
|
|
|
|
On the first evaluation, the Verilated code calls initial blocks, and then
|
|
"settles" the modules, by evaluating functions (from always statements)
|
|
until all signals are stable.
|
|
|
|
On other evaluations, the Verilated code detects what input signals have
|
|
changes. If any are clocks, it calls the appropriate sequential functions
|
|
(from ``always @ posedge`` statements). Interspersed with sequential
|
|
functions, it calls combo functions (from ``always @*``). After this is
|
|
complete, it detects any changes due to combo loops or internally generated
|
|
clocks, and if one is found must reevaluate the model again.
|
|
|
|
For SystemC code, the ``eval()`` function is wrapped in a SystemC
|
|
``SC_METHOD``, sensitive to all inputs. (Ideally, it would only be sensitive
|
|
to clocks and combo inputs, but tracing requires all signals to cause
|
|
evaluation, and the performance difference is small.)
|
|
|
|
If tracing is enabled, a callback examines all variables in the design for
|
|
changes, and writes the trace for each change. To accelerate this process,
|
|
the evaluation process records a bitmask of variables that might have
|
|
changed; if clear, checking those signals for changes may be skipped.
|
|
|
|
|
|
Coding Conventions
|
|
==================
|
|
|
|
|
|
Compiler Version and C++14
|
|
--------------------------
|
|
|
|
Verilator requires C14. Verilator does not require any newer versions, but
|
|
is maintained to build successfully with C17/C20.
|
|
|
|
|
|
Indentation and Naming Style
|
|
----------------------------
|
|
|
|
We will work with contributors to fix up indentation style issues, but it
|
|
is appreciated if you could match our style:
|
|
|
|
- Use "mixedCapsSymbols" instead of "underlined_symbols".
|
|
|
|
- Use a "p" suffix on variables that are pointers, e.g., "nodep".
|
|
|
|
- Comment every member variable.
|
|
|
|
- In the include directory, use /// to document functions the user
|
|
calls. (This convention has not been applied retroactively.)
|
|
|
|
C and Python indentation is automatically maintained with "make format"
|
|
using clang-format version 10.0.0, and yapf for python, and is
|
|
automatically corrected in the CI actions. For those manually formatting C
|
|
code:
|
|
|
|
- Use four spaces per level, and no tabs.
|
|
|
|
- Use two spaces between the end of source and the beginning of a
|
|
comment.
|
|
|
|
- Use one space after if/for/switch/while and similar keywords.
|
|
|
|
- No spaces before semicolons, nor between a function's name and open
|
|
parenthesis (only applies to functions; if/else has a following space).
|
|
|
|
|
|
The ``astgen`` Script
|
|
---------------------
|
|
|
|
The ``astgen`` script is used to generate some of the repetitive C++ code
|
|
related to the ``AstNode`` type hierarchy. An example is the abstract ``visit``
|
|
methods in ``VNVisitor``. There are other uses; please see the ``*__gen*``
|
|
files in the bulid directories and the ``astgen`` script for details. A
|
|
description of the more advanced features of ``astgen`` are provided here.
|
|
|
|
|
|
Generating ``AstNode`` members
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Some of the member s of ``AstNode`` sub-classes are generated by ``astgen``.
|
|
These are emitted as pre-processor macro definitions, which then need to be
|
|
added to the ``AstNode`` sub-classes they correspond to. Specifically ``class
|
|
AstFoo`` should contain an instance of ``ASTGEN_MEMBERS_AstFoo;`` at class
|
|
scope. The ``astgen`` script checks and errors if this is not present. The
|
|
method generated depends on whether the class is a concrete final class, or an
|
|
abstract ``AstNode*`` base-class, and on ``@astgen`` directives present in
|
|
comment sections in the body of the ``AstNode`` sub-class definitions.
|
|
|
|
|
|
List of ``@astgen`` directives
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
``@astgen`` directives in comments contained in the body of ``AstNode``
|
|
sub-class definitions are parsed and contribute to the code generated by
|
|
``astgen``. The general syntax is ``@astgen <keywords> := <description>``,
|
|
where ``<keywords>`` determines what is being defined, and ``<description>`` is
|
|
a ``<keywords>`` dependent description of the definition. The list of
|
|
``@astgen`` directives are as follows:
|
|
|
|
|
|
``op<N>`` operand directives
|
|
+++++++++++++++++++++++++++++
|
|
|
|
The ``op1``, ``op2``, ``op3`` and ``op4`` directives are used to describe the
|
|
name and type of the up to 4 child operands of a node. The syntax of the
|
|
``<description>`` field is ``<identifier> : <type>``, where ``<identifier>``
|
|
will be used as the base name of the generated operand accessors, and
|
|
``<type>`` is one of:
|
|
|
|
1. An ``AstNode`` sub-class, defining the operand to be of that type, always
|
|
no-null, and with an always null ``nextp()``. That is, the child node is
|
|
always present, and is a single ``AstNode`` (as opposed to a list).
|
|
|
|
2. ``Optional[<AstNode sub-class>]``. This is just like in point 1 above, but
|
|
defines the child node to be optional, meaning it may be null.
|
|
|
|
3. ``List[AstNode sub-class]`` describes a list operand, which means the child
|
|
node may have a non-null ``nextp()`` and in addition the child itself may be
|
|
null, representing an empty list.
|
|
|
|
|
|
An example of the full syntax of the directive is
|
|
``@astgen op1 := lhsp : AstNodeExpr``.
|
|
|
|
``astnode`` generates accessors for the child nodes based on these directives.
|
|
For non-list children, the names of the getter and setter both are that of the
|
|
given ``<identifier>``. For list-type children, the getter is ``<identifier>``,
|
|
and instead of the setter, there an ``add<Identifier>`` method is generated
|
|
that appends new nodes (or lists of nodes) to the child list.
|
|
|
|
|
|
``alias op<N>`` operand alias directives
|
|
++++++++++++++++++++++++++++++++++++++++
|
|
|
|
If a super-class already defined a name and type for a child node using the
|
|
``op<N>`` directive, but a more appropriate name exists in the context of a
|
|
sub-class, then the alias directive can be used to introduce an additional name
|
|
for the child node. The is ``alias op<N> := <identifier>`` where
|
|
``<identifier>`` is the new name. ``op<N>`` must have been defined in some
|
|
super-class of the current node.
|
|
|
|
Example: ``@astgen alias op1 := condp``
|
|
|
|
|
|
Generating ``DfgVertex`` sub-classes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Most of the ``DfgVertex`` sub-classes are generated by ``astgen``, from the
|
|
definitions of the corresponding ``AstNode`` vertices.
|
|
|
|
|
|
Additional features of ``astgen``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In addition to generating ``AstNode`` members as described above,
|
|
``astgen`` is also use to handle some of the repetitive implementation code
|
|
that is still variable enough not to be handled in C++ macros.
|
|
|
|
In particular, ``astgen`` is used to pre-process some of the C++ source
|
|
files. For example in ``V3Const.cpp``, it is used to implement the
|
|
``visit()`` functions for each binary operation using the ``TREEOP`` macro.
|
|
|
|
The original C++ source code is transformed into C++ code in the ``obj_opt``
|
|
and ``obj_dbg`` sub-directories (the former for the optimized version of
|
|
Verilator, the latter for the debug version). So for example
|
|
``V3Const.cpp`` into ``V3Const__gen.cpp``.
|
|
|
|
|
|
Visitor Functions
|
|
-----------------
|
|
|
|
Verilator uses the "Visitor" design pattern to implement its refinement and
|
|
optimization passes. This allows separation of the pass algorithm from the
|
|
AST on which it operates. Wikipedia provides an introduction to the concept
|
|
at https://en.wikipedia.org/wiki/Visitor_pattern.
|
|
|
|
As noted above, all visitors are derived classes of ``VNVisitor``. All
|
|
derived classes of ``AstNode`` implement the ``accept`` method, which takes
|
|
as argument a reference to an instance or a ``VNVisitor`` derived class
|
|
and applies the visit method of the ``VNVisitor`` to the invoking AstNode
|
|
instance (i.e. ``this``).
|
|
|
|
One possible difficulty is that a call to ``accept`` may perform an edit
|
|
which destroys the node it receives as an argument. The
|
|
``acceptSubtreeReturnEdits`` method of ``AstNode`` is provided to apply
|
|
``accept`` and return the resulting node, even if the original node is
|
|
destroyed (if it is not destroyed, it will just return the original node).
|
|
|
|
The behavior of the visitor classes is achieved by overloading the
|
|
``visit`` function for the different ``AstNode`` derived classes. If a
|
|
specific implementation is not found, the system will look in turn for
|
|
overloaded implementations up the inheritance hierarchy. For example
|
|
calling ``accept`` on ``AstIf`` will look in turn for:
|
|
|
|
::
|
|
|
|
void visit(AstIf* nodep)
|
|
void visit(AstNodeIf* nodep)
|
|
void visit(AstNodeStmt* nodep)
|
|
void visit(AstNode* nodep)
|
|
|
|
There are three ways data is passed between visitor functions.
|
|
|
|
1. A visitor-class member variable. This is generally for passing
|
|
"parent" information down to children. ``m_modp`` is a common
|
|
example. It's set to NULL in the constructor, where that node
|
|
(``AstModule`` visitor) sets it, then the children are iterated, then
|
|
it's cleared. Children under an ``AstModule`` will see it set, while
|
|
nodes elsewhere will see it clear. If there can be nested items (for
|
|
example an ``AstFor`` under an ``AstFor``) the variable needs to be
|
|
save-set-restored in the ``AstFor`` visitor; otherwise exiting the
|
|
lower for will lose the upper for's setting.
|
|
|
|
2. User attributes. Each ``AstNode`` (**Note.** The AST node, not the
|
|
visitor) has five user attributes, which may be accessed as an
|
|
integer using the ``user1()`` through ``user4()`` methods, or as a
|
|
pointer (of type ``AstNUser``) using the ``user1p()`` through
|
|
``user4p()`` methods (a common technique lifted from graph traversal
|
|
packages).
|
|
|
|
A visitor first clears the one it wants to use by calling
|
|
``AstNode::user#ClearTree()``, then it can mark any node's
|
|
``user#()`` with whatever data it wants. Readers just call
|
|
``nodep->user()``, but may need to cast appropriately, so you'll often
|
|
see ``VN_CAST(nodep->userp(), SOMETYPE)``. At the top of each visitor
|
|
are comments describing how the ``user()`` stuff applies to that
|
|
visitor class. For example:
|
|
|
|
::
|
|
|
|
// NODE STATE
|
|
// Cleared entire netlist
|
|
// AstModule::user1p() // bool. True to inline this module
|
|
|
|
This says that at the ``AstNetlist`` ``user1ClearTree()`` is called.
|
|
Each :literal:`AstModule's `user1()` is used to indicate if we're
|
|
going to inline it.
|
|
|
|
These comments are important to make sure a ``user#()`` on a given
|
|
``AstNode`` type is never being used for two different purposes.
|
|
|
|
Note that calling ``user#ClearTree`` is fast; it doesn't walk the
|
|
tree, so it's ok to call fairly often. For example, it's commonly
|
|
called on every module.
|
|
|
|
3. Parameters can be passed between the visitors in close to the
|
|
"normal" function caller to callee way. This is the second ``vup``
|
|
parameter of type ``AstNUser`` that is ignored on most of the visitor
|
|
functions. V3Width does this, but it proved messier than the above
|
|
and is deprecated. (V3Width was nearly the first module written.
|
|
Someday this scheme may be removed, as it slows the program down to
|
|
have to pass vup everywhere.)
|
|
|
|
|
|
Iterators
|
|
---------
|
|
|
|
``VNVisitor`` provides a set of iterators to facilitate walking over
|
|
the tree. Each operates on the current ``VNVisitor`` class (as this)
|
|
and takes an argument type ``AstNode*``.
|
|
|
|
``iterate``
|
|
Applies the ``accept`` method of the ``AstNode`` to the visitor
|
|
function.
|
|
|
|
``iterateAndNextIgnoreEdit``
|
|
Applies the ``accept`` method of each ``AstNode`` in a list (i.e.
|
|
connected by ``nextp`` and ``backp`` pointers).
|
|
|
|
``iterateAndNextNull``
|
|
Applies the ``accept`` method of each ``AstNode`` in a list, only if
|
|
the provided node is non-NULL. If a node is edited by the call to
|
|
``accept``, apply ``accept`` again, until the node does not change.
|
|
|
|
``iterateListBackwards``
|
|
Applies the ``accept`` method of each ``AstNode`` in a list, starting
|
|
with the last one.
|
|
|
|
``iterateChildren``
|
|
Applies the ``iterateAndNextNull`` method on each child ``op1p``
|
|
through ``op4p`` in turn.
|
|
|
|
``iterateChildrenBackwards``
|
|
Applies the ``iterateListBackwards`` method on each child ``op1p``
|
|
through ``op4p`` in turn.
|
|
|
|
|
|
Caution on Using Iterators When Child Changes
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Visitors often replace one node with another node; V3Width and V3Const
|
|
are major examples. A visitor which is the parent of such a replacement
|
|
needs to be aware that calling iteration may cause the children to
|
|
change. For example:
|
|
|
|
::
|
|
|
|
// nodep->lhsp() is 0x1234000
|
|
iterateAndNextNull(nodep->lhsp()); // and under covers nodep->lhsp() changes
|
|
// nodep->lhsp() is 0x5678400
|
|
iterateAndNextNull(nodep->lhsp());
|
|
|
|
Will work fine, as even if the first iterate causes a new node to take
|
|
the place of the ``lhsp()``, that edit will update ``nodep->lhsp()``, and
|
|
the second call will correctly see the change. Alternatively:
|
|
|
|
::
|
|
|
|
lp = nodep->lhsp();
|
|
// nodep->lhsp() is 0x1234000, lp is 0x1234000
|
|
iterateAndNextNull(lp); **lhsp=NULL;** // and under covers nodep->lhsp() changes
|
|
// nodep->lhsp() is 0x5678400, lp is 0x1234000
|
|
iterateAndNextNull(lp);
|
|
|
|
This will cause bugs or a core dump, as lp is a dangling pointer. Thus
|
|
it is advisable to set lhsp=NULL shown in the \*'s above to make sure
|
|
these dangles are avoided. Another alternative used in special cases,
|
|
mostly in V3Width, is to use acceptSubtreeReturnEdits, which operates on
|
|
a single node and returns the new pointer if any. Note
|
|
acceptSubtreeReturnEdits does not follow ``nextp()`` links.
|
|
|
|
::
|
|
|
|
lp = acceptSubtreeReturnEdits(lp)
|
|
|
|
|
|
Identifying Derived Classes
|
|
---------------------------
|
|
|
|
A common requirement is to identify the specific ``AstNode`` class we
|
|
are dealing with. For example, a visitor might not implement separate
|
|
``visit`` methods for ``AstIf`` and ``AstGenIf``, but just a single
|
|
method for the base class:
|
|
|
|
::
|
|
|
|
void visit(AstNodeIf* nodep)
|
|
|
|
However that method might want to specify additional code if it is
|
|
called for ``AstGenIf``. Verilator does this by providing a ``VN_IS``
|
|
method for each possible node type, which returns true if the node is of
|
|
that type (or derived from that type). So our ``visit`` method could
|
|
use:
|
|
|
|
::
|
|
|
|
if (VN_IS(nodep, AstGenIf) {
|
|
<code specific to AstGenIf>
|
|
}
|
|
|
|
Additionally the ``VN_CAST`` method converts pointers similar to C++
|
|
``dynamic_cast``. This either returns a pointer to the object cast to
|
|
that type (if it is of class ``SOMETYPE``, or a derived class of
|
|
``SOMETYPE``) or else NULL. (However, for true/false tests, use ``VN_IS``
|
|
as that is faster.)
|
|
|
|
|
|
.. _Testing:
|
|
|
|
Testing
|
|
=======
|
|
|
|
For an overview of how to write a test, see the BUGS section of the
|
|
`Verilator Manual <https://verilator.org/verilator_doc.html>`_.
|
|
|
|
It is important to add tests for failures as well as success (for
|
|
example to check that an error message is correctly triggered).
|
|
|
|
Tests that fail should, by convention have the suffix ``_bad`` in their
|
|
name, and include ``fails = 1`` in either their ``compile`` or
|
|
``execute`` step as appropriate.
|
|
|
|
|
|
Preparing to Run Tests
|
|
----------------------
|
|
|
|
For all tests to pass, you must install the following packages:
|
|
|
|
- SystemC to compile the SystemC outputs, see http://systemc.org
|
|
|
|
- Parallel::Forker from CPAN to run tests in parallel; you can install
|
|
this with e.g. "sudo cpan install Parallel::Forker".
|
|
|
|
- vcddiff to find differences in VCD outputs. See the readme at
|
|
https://github.com/veripool/vcddiff
|
|
|
|
- Cmake for build paths that use it.
|
|
|
|
|
|
Controlling the Test Driver
|
|
---------------------------
|
|
|
|
The test driver script `driver.pl` runs tests; see the `Test Driver`
|
|
section. The individual test drivers are written in Perl; see `Test
|
|
Language`.
|
|
|
|
|
|
Manual Test Execution
|
|
---------------------
|
|
|
|
A specific regression test can be executed manually. To start the
|
|
"EXAMPLE" test, run the following command.
|
|
|
|
::
|
|
|
|
test_regress/t/t_EXAMPLE.pl
|
|
|
|
|
|
Regression Testing for Developers
|
|
---------------------------------
|
|
|
|
Developers will also want to call ./configure with two extra flags:
|
|
|
|
``--enable-ccwarn``
|
|
This causes the build to stop on warnings as well as errors. A good way
|
|
to ensure no sloppy code gets added; however it can be painful when it
|
|
comes to testing, since third party code used in the tests (e.g.
|
|
SystemC) may not be warning free.
|
|
|
|
``--enable-longtests``
|
|
In addition to the standard C, SystemC examples, also run the tests
|
|
in the ``test_regress`` directory when using *make test*'. This is
|
|
disabled by default, as SystemC installation problems would otherwise
|
|
falsely indicate a Verilator problem.
|
|
|
|
When enabling the long tests, some additional Perl modules are needed,
|
|
which you can install using cpan.
|
|
|
|
::
|
|
|
|
cpan install Parallel::Forker
|
|
|
|
There are some traps to avoid when running regression tests
|
|
|
|
- When checking the MANIFEST, the test will fail on unexpected code in the
|
|
Verilator tree. So make sure to keep any such code outside the tree.
|
|
|
|
- Not all Linux systems install Perldoc by default. This is needed for the
|
|
``--help`` option to Verilator, and also for regression testing. This
|
|
can be installed using CPAN:
|
|
|
|
::
|
|
|
|
cpan install Pod::Perldoc
|
|
|
|
Many Linux systems also offer a standard package for this. Red
|
|
Hat/Fedora/Centos offer *perl-Pod-Perldoc*', while
|
|
Debian/Ubuntu/Linux Mint offer \`perl-doc'.
|
|
|
|
- Running regression may exhaust resources on some Linux systems,
|
|
particularly file handles and user processes. Increase these to
|
|
respectively 16,384 and 4,096. The method of doing this is
|
|
system-dependent, but on Fedora Linux it would require editing the
|
|
``/etc/security/limits.conf`` file as root.
|
|
|
|
Diffing generated code after changes
|
|
------------------------------------
|
|
|
|
When making a change in the code generation area that should not change the
|
|
actual emitted code, it is useful to perform a diff to make sure the emitted
|
|
code really did not change. To do this, the top level Makefile provides the
|
|
*test-snap* and *test-diff* targets:
|
|
|
|
- Run the test suite with ``make test``
|
|
- Take a snapshot with ``make test-snap``
|
|
- Apply your changes
|
|
- Run the test suite again with ``make test``
|
|
- See the changes in the output with ``make test-diff``
|
|
|
|
Continuous Integration
|
|
----------------------
|
|
|
|
Verilator uses GitHub Actions which automatically tests the master branch
|
|
for test failures on new commits. It also runs a daily cron job to validate
|
|
all tests against different OS and compiler versions.
|
|
|
|
Developers can enable Actions on their GitHub repository so that the CI
|
|
environment can check their branches too by enabling the build workflow:
|
|
|
|
- On GitHub, navigate to the main page of the repository.
|
|
|
|
- Under your repository name, click Actions.
|
|
|
|
- In the left sidebar, click the workflow you want to enable ("build").
|
|
|
|
- Click Enable workflow.
|
|
|
|
|
|
Fuzzing
|
|
-------
|
|
|
|
There are scripts included to facilitate fuzzing of Verilator. These
|
|
have been successfully used to find a number of bugs in the frontend.
|
|
|
|
The scripts are based on using `American fuzzy
|
|
lop <https://lcamtuf.coredump.cx/afl/>`__ on a Debian-like system.
|
|
|
|
To get started, cd to "nodist/fuzzer/" and run "./all". A sudo password may
|
|
be required to setup the system for fuzzing.
|
|
|
|
|
|
Debugging
|
|
=========
|
|
|
|
|
|
Debug Levels
|
|
------------
|
|
|
|
The "UINFO" calls in the source indicate a debug level. Messages level 3
|
|
and below are globally enabled with ``--debug``. Higher levels may be
|
|
controlled with ``--debugi <level>``. An individual source file levels may
|
|
be controlled with ``-debugi-<srcfile> <level>``. For example ``--debug
|
|
--debugi 5 --debugi-V3Width 9`` will use the debug binary at default
|
|
debug level 5, with the V3Width.cpp file at level 9.
|
|
|
|
|
|
--debug
|
|
-------
|
|
|
|
When you run with ``--debug``, there are three primary output file types
|
|
placed into the obj_dir, .vpp, .tree and .dot files.
|
|
|
|
.vpp Output
|
|
-----------
|
|
|
|
Verilator creates a *{mod_prefix}*\ __inputs\ .vpp file containing all the
|
|
files that were read, filtered by preprocessing. This file can be fed back
|
|
into Verilator, replacing on the command line all of the previous input
|
|
files, to enable simplification of test cases.
|
|
|
|
Verilator also creates .vpp files for each individual file passed on the
|
|
command line.
|
|
|
|
|
|
.dot Output
|
|
-----------
|
|
|
|
Dot files are dumps of internal graphs in `GraphViz
|
|
<https://www.graphviz.org>`__ dot format. When a dot file is dumped,
|
|
Verilator will also print a line on stdout that can be used to format the
|
|
output, for example:
|
|
|
|
::
|
|
|
|
dot -Tps -o ~/a.ps obj_dir/Vtop_foo.dot
|
|
|
|
You can then print a.ps. You may prefer gif format, which doesn't get
|
|
scaled so it can be more useful with large graphs.
|
|
|
|
For interactive graph viewing consider `xdot
|
|
<https://github.com/jrfonseca/xdot.py>`__ or `ZGRViewer
|
|
<http://zvtm.sourceforge.net/zgrviewer.html>`__. If you know of better
|
|
viewers (especially for large graphs) please let us know.
|
|
|
|
|
|
.tree Output
|
|
------------
|
|
|
|
Tree files are dumps of the AST Tree and are produced between every major
|
|
algorithmic stage. An example:
|
|
|
|
::
|
|
|
|
NETLIST 0x90fb00 <e1> {a0ah}
|
|
1: MODULE 0x912b20 <e8822> {a8ah} top L2 [P]
|
|
*1:2: VAR 0x91a780 <e74#> {a22ah} @dt=0xa2e640(w32) out_wide [O] WIRE
|
|
1:2:1: BASICDTYPE 0xa2e640 <e2149> {e24ah} @dt=this(sw32) integer kwd=integer range=[31:0]
|
|
|
|
The following summarizes the above example dump, with more detail on each
|
|
field in the section below.
|
|
|
|
+---------------+--------------------------------------------------------+
|
|
| ``1:2:`` | The hierarchy of the ``VAR`` is the ``op2p`` |
|
|
| | pointer under the ``MODULE``, which in turn is the |
|
|
| | ``op1p`` pointer under the ``NETLIST`` |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``VAR`` | The AstNodeType (e.g. ``AstVar``). |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``0x91a780`` | Address of this node. |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``<e74>`` | The 74th edit to the netlist was the last |
|
|
| | modification to this node. |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``{a22ah}`` | This node is related to the source filename |
|
|
| | "a", where "a" is the first file read, "z" the 26th, |
|
|
| | and "aa" the 27th. Then line 22 in that file, then |
|
|
| | column 8 (aa=0, az=25, ba=26, ...). |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``@dt=0x...`` | The address of the data type this node contains. |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``w32`` | The data-type width() is 32 bits. |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``out_wide`` | The name() of the node, in this case, the name of the |
|
|
| | variable. |
|
|
+---------------+--------------------------------------------------------+
|
|
| ``[O]`` | Flags which vary with the type of node, in this |
|
|
| | case, it means the variable is an output. |
|
|
+---------------+--------------------------------------------------------+
|
|
|
|
In more detail, the following fields are dumped common to all nodes. They
|
|
are produced by the ``AstNode::dump()`` method:
|
|
|
|
Tree Hierarchy
|
|
The dump lines begin with numbers and colons to indicate the child
|
|
node hierarchy. As noted above, ``AstNode`` has lists of items at the
|
|
same level in the AST, connected by the ``nextp()`` and ``prevp()``
|
|
pointers. These appear as nodes at the same level. For example, after
|
|
inlining:
|
|
|
|
::
|
|
|
|
NETLIST 0x929c1c8 <e1> {a0} w0
|
|
1: MODULE 0x92bac80 <e3144> {e14} w0 TOP_t L1 [P]
|
|
1:1: CELLINLINE 0x92bab18 <e3686#> {e14} w0 v -> t
|
|
1:1: CELLINLINE 0x92bc1d8 <e3688#> {e24} w0 v__DOT__i_test_gen -> test_gen
|
|
...
|
|
1: MODULE 0x92b9bb0 <e503> {e47} w0 test_gen L3
|
|
...
|
|
|
|
AstNode type
|
|
The textual name of this node AST type (always in capitals). Many of
|
|
these correspond directly to Verilog entities (for example ``MODULE``
|
|
and ``TASK``), but others are internal to Verilator (for example
|
|
``NETLIST`` and ``BASICDTYPE``).
|
|
|
|
Address of the node
|
|
A hexadecimal address of the node in memory. Useful for examining
|
|
with the debugger. If the actual address values are not important,
|
|
then using the ``--dump-tree-addrids`` option will convert address
|
|
values to short identifiers of the form ``([A-Z]*)``, which is
|
|
hopefully easier for the reader to cross-reference throughout the
|
|
dump.
|
|
|
|
Last edit number
|
|
Of the form ``<ennnn>`` or ``<ennnn#>`` , where ``nnnn`` is the
|
|
number of the last edit to modify this node. The trailing ``#``
|
|
indicates the node has been edited since the last tree dump
|
|
(typically in the last refinement or optimization pass). GDB can
|
|
watch for this; see << /Debugging >>.
|
|
|
|
Source file and line
|
|
Of the form ``{xxnnnn}``, where C{xx} is the filename letter (or
|
|
letters) and ``nnnn`` is the line number within that file. The first
|
|
file is ``a``, the 26th is ``z``, the 27th is ``aa``, and so on.
|
|
|
|
User pointers
|
|
Shows the value of the node's user1p...user4p, if non-NULL.
|
|
|
|
Data type
|
|
Many nodes have an explicit data type. "@dt=0x..." indicates the
|
|
address of the data type (AstNodeDType) this node uses.
|
|
|
|
If a data type is present and is numeric, it then prints the width of
|
|
the item. This field is a sequence of flag characters and width data
|
|
as follows:
|
|
|
|
- ``s`` if the node is signed.
|
|
|
|
- ``d`` if the node is a double (i.e. a floating point entity).
|
|
|
|
- ``w`` always present, indicating this is the width field.
|
|
|
|
- ``u`` if the node is unsized.
|
|
|
|
- ``/nnnn`` if the node is unsized, where ``nnnn`` is the minimum
|
|
width.
|
|
|
|
Name of the entity represented by the node if it exists
|
|
For example, for a ``VAR`` is the name of the variable.
|
|
|
|
Many nodes follow these fields with additional node-specific
|
|
information. Thus the ``VARREF`` node will print either ``[LV]`` or
|
|
``[RV]`` to indicate a left value or right value, followed by the node
|
|
of the variable being referred to. For example:
|
|
|
|
::
|
|
|
|
1:2:1:1: VARREF 0x92c2598 <e509> {e24} w0 clk [RV] <- VAR 0x92a2e90 <e79> {e18} w0 clk [I] INPUT
|
|
|
|
In general, examine the ``dump()`` method in ``V3AstNodes.cpp`` of the node
|
|
type in question to determine additional fields that may be printed.
|
|
|
|
The ``MODULE`` has a list of ``CELLINLINE`` nodes referred to by its
|
|
``op1p()`` pointer, connected by ``nextp()`` and ``prevp()`` pointers.
|
|
|
|
Similarly, the ``NETLIST`` has a list of modules referred to by its
|
|
``op1p()`` pointer.
|
|
|
|
|
|
.tree.dot Output
|
|
----------------
|
|
|
|
``*.tree.dot`` files are dumps of the AST Tree in `GraphViz
|
|
<https://www.graphviz.org>`__ dot format. This can be used to visualize the
|
|
AST Tree. The vertices correspond to ``AstNode`` instances, and the edges
|
|
represent the pointers (``op1p``, ``op2p``, etc) between the nodes.
|
|
|
|
|
|
Debugging with GDB
|
|
------------------
|
|
|
|
The `driver.pl` script accepts ``--debug --gdb`` to start
|
|
Verilator under gdb and break when an error is hit, or the program is about
|
|
to exit. You can also use ``--debug --gdbbt`` to just backtrace and then
|
|
exit gdb. To debug the Verilated executable, use ``--gdbsim``.
|
|
|
|
If you wish to start Verilator under GDB (or another debugger), then you
|
|
can use ``--debug`` and look at the underlying invocation of
|
|
``verilator_dbg``. For example
|
|
|
|
::
|
|
|
|
t/t_alw_dly.pl --debug
|
|
|
|
shows it invokes the command:
|
|
|
|
::
|
|
|
|
../verilator_bin_dbg --prefix Vt_alw_dly --x-assign unique --debug
|
|
-cc -Mdir obj_dir/t_alw_dly --debug-check -f input.vc t/t_alw_dly.v
|
|
|
|
Start GDB, then ``start`` with the remaining arguments.
|
|
|
|
::
|
|
|
|
gdb ../verilator_bin_dbg
|
|
...
|
|
(gdb) start --prefix Vt_alw_dly --x-assign unique --debug -cc -Mdir
|
|
obj_dir/t_alw_dly --debug-check -f input.vc t/t_alw_dly.v
|
|
> obj_dir/t_alw_dly/vlt_compile.log
|
|
...
|
|
Temporary breakpoint 1, main (argc=13, argv=0xbfffefa4, env=0xbfffefdc)
|
|
at ../Verilator.cpp:615
|
|
615 ios::sync_with_stdio();
|
|
(gdb)
|
|
|
|
You can then continue execution with breakpoints as required.
|
|
|
|
To break at a specific edit number which changed a node (presumably to
|
|
find what made a <e#*#*> line in the tree dumps):
|
|
|
|
::
|
|
|
|
watch AstNode::s_editCntGbl==####
|
|
|
|
Then, when the watch fires, to break at every following change to that
|
|
node:
|
|
|
|
::
|
|
|
|
watch m_editCount
|
|
|
|
To print a node:
|
|
|
|
::
|
|
|
|
pn nodep
|
|
# or: call dumpGdb(nodep) # aliased to "pn" in src/.gdbinit
|
|
pnt nodep
|
|
# or: call dumpTreeGdb(nodep) # aliased to "pnt" in src/.gdbinit
|
|
|
|
When GDB halts, it is useful to understand that the backtrace will commonly
|
|
show the iterator functions between each invocation of ``visit`` in the
|
|
backtrace. You will typically see a frame sequence something like:
|
|
|
|
::
|
|
|
|
...
|
|
visit()
|
|
iterateChildren()
|
|
iterateAndNext()
|
|
accept()
|
|
visit()
|
|
...
|
|
|
|
|
|
Adding a New Feature
|
|
====================
|
|
|
|
Generally, what would you do to add a new feature?
|
|
|
|
1. File an issue (if there isn't already) so others know what you're
|
|
working on.
|
|
|
|
2. Make a testcase in the test_regress/t/t_EXAMPLE format, see `Testing`.
|
|
|
|
3. If grammar changes are needed, look at the git version of VerilogPerl's
|
|
src/VParseGrammar.y, as this grammar supports the full SystemVerilog
|
|
language and has a lot of back-and-forth with Verilator's grammar. Copy
|
|
the appropriate rules to src/verilog.y and modify the productions.
|
|
|
|
4. If a new Ast type is needed, add it to the appropriate V3AstNode*.h.
|
|
Follow the convention described above about the AstNode type hierarchy.
|
|
Ordering of definitions is enforced by ``astgen``.
|
|
|
|
5. Now you can run ``test_regress/t/t_<newtestcase>.pl --debug`` and it'll
|
|
probably fail, but you'll see a
|
|
``test_regress/obj_dir/t_<newtestcase>/*.tree`` file which you can examine
|
|
to see if the parsing worked. See also the sections above on debugging.
|
|
|
|
6. Modify the later visitor functions to process the new feature as needed.
|
|
|
|
|
|
Adding a New Pass
|
|
-----------------
|
|
|
|
For more substantial changes, you may need to add a new pass. The simplest
|
|
way to do this is to copy the ``.cpp`` and ``.h`` files from an existing
|
|
pass. You'll need to add a call into your pass from the ``process()``
|
|
function in ``src/verilator.cpp``.
|
|
|
|
To get your pass to build, you'll need to add its binary filename to the
|
|
list in ``src/Makefile_obj.in`` and reconfigure.
|
|
|
|
|
|
"Never" features
|
|
----------------
|
|
|
|
Verilator ideally would support all of IEEE, and has the goal to get close
|
|
to full support. However the following IEEE sections and features are not
|
|
anticipated to be ever implemented for the reasons indicated.
|
|
|
|
IEEE 1800-2017 3.3 modules within modules
|
|
Little/no tool support, and arguably not a good practice.
|
|
IEEE 1800-2017 6.12 "shortreal"
|
|
Little/no tool support, and easily promoted to real.
|
|
IEEE 1800-2017 11.11 Min, typ, max
|
|
No SDF support, so will always use typical.
|
|
IEEE 1800-2017 20.16 Stochastic analysis
|
|
Little industry use.
|
|
IEEE 1800-2017 20.17 PLA modeling
|
|
Little industry use and outdated technology.
|
|
IEEE 1800-2017 31 Timing checks
|
|
No longer relevant with static timing analysis tools.
|
|
IEEE 1800-2017 32 SDF annotation
|
|
No longer relevant with static timing analysis tools.
|
|
IEEE 1800-2017 33 Config
|
|
Little industry use.
|
|
|
|
|
|
|
|
Test Driver
|
|
===========
|
|
|
|
This section documents the test driver script, `driver.pl`. driver.pl
|
|
invokes Verilator or another simulator on each test file. For test file
|
|
contents description see `Test Language`.
|
|
|
|
The driver reports the number of tests which pass, fail, or skipped (some
|
|
resource required by the test is not available, such as SystemC).
|
|
|
|
There are thousands of tests, and for faster completion you may want to run
|
|
the regression tests with OBJCACHE enabled and in parallel on a machine
|
|
with many cores. See the -j option and OBJCACHE environment variable.
|
|
|
|
|
|
driver.pl Non-Scenario Arguments
|
|
--------------------------------
|
|
|
|
--benchmark [<cycles>]
|
|
Show execution times of each step. If an optional number is given,
|
|
specifies the number of simulation cycles (for tests that support it).
|
|
|
|
--debug
|
|
Same as ``verilator --debug``: Use the debug version of Verilator which
|
|
enables additional assertions, debugging messages, and structure dump
|
|
files.
|
|
|
|
--debugi(-<srcfile>) <level>
|
|
Same as ``verilator --debugi level``: Set Verilator internal debugging
|
|
level globally to the specified debug level (1-10).
|
|
|
|
--dump-tree
|
|
Same as ``verilator --dump-tree``: Enable Verilator writing .tree debug
|
|
files with dumping level 3, which dumps the standard critical stages.
|
|
For details on the format see `.tree Output`.
|
|
|
|
--gdb
|
|
Same as ``verilator --gdb``: Run Verilator under the debugger.
|
|
|
|
--gdbbt
|
|
Same as ``verilator --gdbbt``: Run Verilator under the debugger, only to
|
|
print backtrace information. Requires ``--debug``.
|
|
|
|
--gdbsim
|
|
Run Verilator generated executable under the debugger.
|
|
|
|
--golden
|
|
Update golden files, equivalent to ``export HARNESS_UPDATE_GOLDEN=1``.
|
|
|
|
--hashset <set>/<numsets>
|
|
Split tests based on a hash of the test names into <numsets> and run only
|
|
tests in set number <set> (0..<numsets>-1).
|
|
|
|
--help
|
|
Displays help message and exits.
|
|
|
|
--j #
|
|
Run number of parallel tests, or 0 to determine the count based on the
|
|
number of cores installed. Requires Perl's Parallel::Forker package.
|
|
|
|
--quiet
|
|
Suppress all output except for failures and progress messages every 15
|
|
seconds. Intended for use only in automated regressions. See also
|
|
``--rerun``, and ``--verbose`` which is not the opposite of ``--quiet``.
|
|
|
|
--rerun
|
|
Rerun all tests that failed in this run. Reruns force the flags
|
|
``--no-quiet --j 1``.
|
|
|
|
--rr
|
|
Same as ``verilator --rr``: Run Verilator and record with ``rr``.
|
|
|
|
--rrsim
|
|
Run Verilator generated executable and record with ``rr``.
|
|
|
|
--sanitize
|
|
Enable address sanitizer to compile Verilated C++ code. This may detect
|
|
misuses of memory, such as out-of-bound accesses, use-after-free, and
|
|
memory leaks.
|
|
|
|
--site
|
|
Run site specific tests also.
|
|
|
|
--stop
|
|
Stop on the first error.
|
|
|
|
--trace
|
|
Set the simulator specific flags to request waveform tracing.
|
|
|
|
--verbose
|
|
Compile and run the test in verbose mode. This means ``TEST_VERBOSE``
|
|
will be defined for the test (Verilog and any C++/SystemC wrapper).
|
|
|
|
--verilated-debug
|
|
For tests using the standard C++ wrapper, enable runtime debug mode.
|
|
|
|
|
|
driver.pl Scenario Arguments
|
|
----------------------------
|
|
|
|
The following options control which simulator is used, and which tests are
|
|
run. Multiple flags may be used to run multiple simulators/scenarios
|
|
simultaneously.
|
|
|
|
--atsim
|
|
Run ATSIM simulator tests.
|
|
|
|
--dist
|
|
Run simulator-agnostic distribution tests.
|
|
|
|
--ghdl
|
|
Run GHDL simulator tests.
|
|
|
|
--iv
|
|
Run Icarus Verilog simulator tests.
|
|
|
|
--ms
|
|
Run ModelSim simulator tests.
|
|
|
|
--nc
|
|
Run Cadence NC-Verilog simulator tests.
|
|
|
|
--vcs
|
|
Run Synopsys VCS simulator tests.
|
|
|
|
--vlt
|
|
Run Verilator tests in single-threaded mode. Default unless another
|
|
scenario flag is provided.
|
|
|
|
--vltmt
|
|
Run Verilator tests in multithreaded mode.
|
|
|
|
--xsim
|
|
Run Xilinx XSim simulator tests.
|
|
|
|
|
|
driver.pl Environment
|
|
---------------------
|
|
|
|
HARNESS_UPDATE_GOLDEN
|
|
If true, update all .out golden reference files. Typically, instead the
|
|
``--golden`` option is used to update only a single test's reference.
|
|
|
|
SYSTEMC
|
|
Root directory name of SystemC kit. Only used if ``SYSTEMC_INCLUDE`` not
|
|
set.
|
|
|
|
SYSTEMC_INCLUDE
|
|
Directory name with systemc.h in it.
|
|
|
|
VERILATOR_ATSIM
|
|
Command to use to invoke Atsim.
|
|
|
|
VERILATOR_GHDL
|
|
Command to use to invoke GHDL.
|
|
|
|
VERILATOR_GDB
|
|
Command to use to invoke GDB debugger.
|
|
|
|
VERILATOR_IVERILOG
|
|
Command to use to invoke Icarus Verilog.
|
|
|
|
VERILATOR_MAKE
|
|
Command to use to rebuild Verilator and run single test.
|
|
|
|
VERILATOR_MODELSIM
|
|
Command to use to invoke ModelSim.
|
|
|
|
VERILATOR_NCVERILOG
|
|
Command to use to invoke ncverilog.
|
|
|
|
VERILATOR_ROOT
|
|
Standard path to Verilator distribution root; see primary Verilator
|
|
documentation.
|
|
|
|
VERILATOR_TESTS_SITE
|
|
Used with ``--site``, a colon-separated list of directories with tests to
|
|
be added to testlist.
|
|
|
|
VERILATOR_VCS
|
|
Command to use to invoke VCS.
|
|
|
|
VERILATOR_XELAB
|
|
Command to use to invoke XSim xelab
|
|
|
|
VERILATOR_XVLOG
|
|
Command to use to invoke XSim xvlog
|
|
|
|
|
|
Test Language
|
|
=============
|
|
|
|
This section describes the format of the ``test_regress/t/*.pl`` test
|
|
language files, executed by `driver.pl`.
|
|
|
|
Test Language Summary
|
|
---------------------
|
|
|
|
For convenience, a summary of the most commonly used features is provided
|
|
here, with a reference in a later section. All test files typically have a
|
|
call to the ``lint`` or ``compile`` subroutine to compile the test. For
|
|
run-time tests, this is followed by a call to the ``execute``
|
|
subroutine. Both of these functions can optionally be provided with
|
|
arguments specifying additional options.
|
|
|
|
If those complete, the script calls ``ok`` to increment the count of
|
|
successful tests and then returns 1 as its result.
|
|
|
|
The driver.pl script assumes by default that the source Verilog file name
|
|
matches the test script name. So a test whose driver is
|
|
``t/t_mytest.pl`` will expect a Verilog source file ``t/t_mytest.v``.
|
|
This can be changed using the ``top_filename`` subroutine, for example
|
|
|
|
::
|
|
|
|
top_filename("t/t_myothertest.v");
|
|
|
|
By default, all tests will run with major simulators (Icarus Verilog, NC,
|
|
VCS, ModelSim, etc.) as well as Verilator, to allow results to be
|
|
compared. However, if you wish a test only to be used with Verilator, you
|
|
can use the following:
|
|
|
|
::
|
|
|
|
scenarios(vlt => 1);
|
|
|
|
Of the many options that can be set through arguments to ``compiler`` and
|
|
``execute``, the following are particularly useful:
|
|
|
|
``verilator_flags2``
|
|
A list of flags to be passed to verilator when compiling.
|
|
|
|
``fails``
|
|
Set to 1 to indicate that the compilation or execution is intended to fail.
|
|
|
|
For example, the following would specify that compilation requires two
|
|
defines and is expected to fail.
|
|
|
|
::
|
|
|
|
compile(
|
|
verilator_flags2 => ["-DSMALL_CLOCK -DGATED_COMMENT"],
|
|
fails => 1,
|
|
);
|
|
|
|
Hints On Writing Tests
|
|
----------------------
|
|
|
|
There is generally no need for the test to create its own main program or
|
|
top level shell as the driver creates one automatically, however some tests
|
|
require their own C++ or SystemC test harness. This is commonly given the
|
|
same name as the test, but with .cpp as suffix
|
|
(``test_regress/t/t_EXAMPLE.cpp``). This can be specified as follows:
|
|
|
|
::
|
|
|
|
compile(
|
|
make_top_shell => 0,
|
|
make_main => 0,
|
|
verilator_flags2 => ["--exe $Self->{t_dir}/$Self->{name}.cpp"], );
|
|
|
|
Tests should be self-checking, rather than producing lots of output. If a
|
|
test succeeds it should print ``*-* All Finished *-*`` to standard output
|
|
and terminate (in Verilog ``$finish``), if not it should just stop (in
|
|
Verilog ``$stop``) as that signals an error.
|
|
|
|
If termination should be triggered from the C++ wrapper, the following code
|
|
can be used:
|
|
|
|
::
|
|
|
|
vl_fatal(__FILE__, __LINE__, "dut", "<error message goes here>");
|
|
exit(1);
|
|
|
|
Where it might be useful for a test to produce output, it should qualify
|
|
this with ``TEST_VERBOSE``. For example in Verilog:
|
|
|
|
::
|
|
|
|
`ifdef TEST_VERBOSE
|
|
$write("Conditional generate if MASK [%1d] = %d\n", g, MASK[g]);
|
|
`endif
|
|
|
|
Or in a hand-written C++ wrapper:
|
|
|
|
::
|
|
|
|
#ifdef TEST_VERBOSE
|
|
std::cout << "Read a=" << a << std::endl;
|
|
#endif
|
|
|
|
A filename that should be used to check the output results is given with
|
|
``expect_filename``. This should not generally be used to decide if a test
|
|
has succeeded. However, in the case of tests that are designed to fail at
|
|
compile time, it is the only option. For example:
|
|
|
|
::
|
|
|
|
compile(
|
|
fails => 1,
|
|
expect_filename => $Self->{golden_filename},
|
|
);
|
|
|
|
Note ``expect_filename`` strips some debugging information from the logfile
|
|
when comparing.
|
|
|
|
|
|
Test Language Compile/Lint/Run Arguments
|
|
----------------------------------------
|
|
|
|
This section describes common arguments to ``compile()``, ``lint()``, and
|
|
``run()``. The full list of arguments can be found by looking at the
|
|
``driver.pl`` source code.
|
|
|
|
all_run_flags
|
|
A list of flags to be passed when running the simulator (Verilated model
|
|
or one of the other simulators).
|
|
|
|
check_finished
|
|
True to indicate successful completion of the test is indicated by the
|
|
string ``*-* All Finished *-*`` being printed on standard output. This is
|
|
the normal way for successful tests to finish.
|
|
|
|
expect
|
|
A quoted list of strings or regular expression to be matched in the
|
|
output. See `Hints On Writing Tests` for more detail on how this argument
|
|
should be used.
|
|
|
|
fails
|
|
True to indicate this step is expected to fail. Tests that are expected
|
|
to fail generally have _bad in their filename.
|
|
|
|
make_main
|
|
False to disable the automatic creation of a C++ test wrapper (for
|
|
example when a hand-written test wrapper is provided using ``verilator
|
|
--exe``).
|
|
|
|
make_top_shell
|
|
False to disable the automatic creation of a top level shell to run the
|
|
executable (for example when a hand-written test wrapper is provided
|
|
using ``verilator --exe``).
|
|
|
|
ms_flags / ms_flags2 / ms_run_flags
|
|
The equivalent of ``v_flags``, ``v_flags2`` and ``all_run_flags``, but
|
|
only for use with the ModelSim simulator.
|
|
|
|
nc_flags / nc_flags2 / nc_run_flags
|
|
The equivalent of ``v_flags``, ``v_flags2`` and ``all_run_flags``, but
|
|
only for use with the Cadence NC simulator.
|
|
|
|
iv_flags / iv_flags2 / iv_run_flags
|
|
The equivalent of ``v_flags``, ``v_flags2`` and ``all_run_flags``, but
|
|
only for use with the Icarus Verilog simulator.
|
|
|
|
v_flags
|
|
A list of standard Verilog simulator flags to be passed to the simulator
|
|
compiler (Verilator or one of the other simulators). This list is create
|
|
by the driver and rarely changed, use ``v_flags2`` instead.
|
|
|
|
v_flags2
|
|
A list of standard Verilog simulator flags to be passed to the simulator
|
|
compiler (Verilator or one of the other simulators). Unlike ``v_flags``,
|
|
these options may be overridden in some simulation files.
|
|
|
|
Similar sets of flags exist for atsim, GHDL, Cadence NC, ModelSim and
|
|
Synopsys VCS.
|
|
|
|
vcs_flags / vcs_flags2 / vcs_run_flags
|
|
The equivalent of ``v_flags``, ``v_flags2`` and ``all_run_flags``, but
|
|
only for use with the Synopsys VCS simulator.
|
|
|
|
verilator_flags / verilator_flags2
|
|
The equivalent of ``v_flags`` and ``v_flags2``, but only for use with
|
|
Verilator. If a flag is a standard flag, ``+incdir`` for example, pass
|
|
it with ``v_flags2`` instead.
|
|
|
|
benchmarksim
|
|
Output the number of model evaluations and execution time of a test to
|
|
``test_output_dir>/<test_name>_benchmarksim.csv``. Multiple invocations
|
|
of the same test file will append to to the same .csv file.
|
|
|
|
xsim_flags / xsim_flags2 / xsim_run_flags
|
|
The equivalent of ``v_flags``, ``v_flags2`` and ``all_run_flags``, but
|
|
only for use with the Xilinx XSim simulator.
|
|
|
|
|
|
Distribution
|
|
============
|
|
|
|
Copyright 2008-2024 by Wilson Snyder. Verilator is free software; you can
|
|
redistribute it and/or modify it under the terms of either the GNU Lesser
|
|
General Public License Version 3 or the Perl Artistic License Version 2.0.
|
|
|
|
SPDX-License-Identifier: LGPL-3.0-only OR Artistic-2.0
|
|
|
|
.. |Logo| image:: https://www.veripool.org/img/verilator_256_200_min.png
|