mirror of
https://github.com/verilator/verilator.git
synced 2025-01-19 12:54:02 +00:00
Commentary
This commit is contained in:
parent
63f30492be
commit
48603c0ee2
183
TODO
183
TODO
@ -6,129 +6,134 @@
|
||||
// Version 2.0.
|
||||
|
||||
|
||||
Features:
|
||||
Latch optimizations {Need here}
|
||||
Task I/Os connecting to non-simple variables.
|
||||
Fix ordering of each bit separately in a signal (mips)
|
||||
Language support:
|
||||
* Fix ordering of each bit separately in a signal (mips)
|
||||
assign b[3:0] = b[7:4]; assign b[7:4] = in;
|
||||
Support gate primitives/ cell libraries from xilinx, etc
|
||||
Assign dont_care value to an 1'bzzz assignment
|
||||
Function to eval combo logic after /*verilator public*/ functions [gwaters]
|
||||
Support generated clocks (correctness)
|
||||
?gcov coverage
|
||||
Selectable SystemC types based on widths (see notes below)
|
||||
Coverage
|
||||
Points should be per-scope like everything else rather then per-module
|
||||
Expression coverage (see notes)
|
||||
Constant functions for widths, etc, IE "input [log2(PARAM):0] xx;"
|
||||
More Verilog 2001 Support
|
||||
(* *) Attributes (just ignore -- preprocessor?)
|
||||
Real numbers (NEVER)
|
||||
Recursive functions (NEVER)
|
||||
Verilog configuration files (NEVER)
|
||||
DPI to define C/C++ calls from Verilog
|
||||
* Support UDP gate primitives/ cell libraries
|
||||
(have code for combos - problem is sequential udps)
|
||||
* Function to eval combo logic after /*verilator public*/ functions [gwaters]
|
||||
* Support generated clocks (correctness)
|
||||
* Real numbers
|
||||
* Recursive functions
|
||||
* Verilog configuration files
|
||||
* Structs/unions (have starting point)
|
||||
* DPI to define C/C++ calls from Verilog
|
||||
* Expression coverage (see notes)
|
||||
* Better tristate support
|
||||
|
||||
Long-term Features
|
||||
Assertions
|
||||
VHDL parser [Philips]
|
||||
Tristate support
|
||||
SystemPerl integration
|
||||
Multithreaded execution
|
||||
* Assertions
|
||||
* Tristate support
|
||||
* Multithreaded execution
|
||||
|
||||
Configure/Make/Install
|
||||
* Full MSVC++ compilation (does scons support this?) (4.000?)
|
||||
* Distribute with flex/bison already expanded?
|
||||
Flex library not needed. Probably too difficult to be worth it.
|
||||
* Integrate SystemPerl coverage
|
||||
(Note in /usr/include there are no upper cased include files.)
|
||||
Coverage.pm -- Need all functionality, but in C?
|
||||
Coverage/Item.pm -- Need all functionality, but in C?
|
||||
Coverage/ItemKey.pm -- Need all functionality, but in C?
|
||||
sp_preproc -- Some steps in here need to be moved to generated C
|
||||
src/Sp.cpp -- n/a
|
||||
src/SpCommon.h -- mostly overlaps verilatedos.h
|
||||
src/SpCoverage.cpp/h -- All needed
|
||||
src/SpFunctor.cpp/h -- No longer used
|
||||
src/SpTraceVcd.cpp/h -- MOVED
|
||||
src/SpTraceVcdC.cpp/h -- MOVED
|
||||
src/sp_log.cpp/h -- Not needed
|
||||
src/systemperl.h -- some stuff may be cut
|
||||
vcoverage -- Need all functionality, but in C?
|
||||
|
||||
Testing:
|
||||
Capture all inputs into global "rerun it" file
|
||||
Code to make wrapper that sets signals, so can do comparison checks
|
||||
New random program generator
|
||||
Better graph viewer with search and zoom
|
||||
Port and test against opencores.org code
|
||||
* Move test_c/sp/v/verilated into test_regress format (4.000?)
|
||||
* Capture all inputs into global "rerun it" file
|
||||
* Code to make wrapper that sets signals, so can do comparison checks
|
||||
* New random program generator
|
||||
* Better graph viewer with search and zoom
|
||||
* Port and test against opencores.org code
|
||||
|
||||
Usability:
|
||||
Better reporting of unopt problems, including what lines of code
|
||||
Report more errors (all of them?) before exiting [Eugene Weber]
|
||||
* Detect and pre-remove most UNOPTFLATs (4.000)
|
||||
* Better reporting of unopt problems, including what lines of code
|
||||
* Report more errors (all of them?) before exiting [Eugene Weber]
|
||||
* Auto-create scons config files
|
||||
* Print version/etc message at runtime. (4.000?)
|
||||
Include number of lines of code, percent comments, code complexity measurement
|
||||
<-80chars------------------------------------------------------------------->
|
||||
Verilator 3.600 - fast, free, open-sourced. Copyright 2001-2010.
|
||||
Verilated #### modules, #### instances, ##### sigs,
|
||||
#### non-comment lines, ##### ops, ### KB model size
|
||||
* Default the --l2name to remove extra "v" level of hierarchy (flag to make "top")
|
||||
|
||||
Internal Code:
|
||||
Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
|
||||
and removing it will speed up and simplify all the other code.
|
||||
V3Graph should be templated container type, taking in Vertex + Edge types
|
||||
* Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
|
||||
and removing it will speed up and simplify all the other code.
|
||||
* V3Graph should be templated container type, taking in Vertex + Edge types
|
||||
* Rename V3PreLex etc to match VerilogPerl filenames
|
||||
* Instead of string, have an VEncodedString/VIdString which contains __DOT__ish
|
||||
things, to reduce bugs. Also add _20 trailing space to \ encoded names. (4.000)
|
||||
|
||||
Runtime:
|
||||
* New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?)
|
||||
* Remove all private internal functions from top level wrapper header, move
|
||||
to new level (4.000?)
|
||||
* Completely standalone simulation (4.000)
|
||||
main() records arguments for $test$plusvars
|
||||
instantiates top,
|
||||
does tracing (support $dump?)
|
||||
calls top->simulateForever()
|
||||
exits
|
||||
|
||||
Performance:
|
||||
Constant propagation
|
||||
* Latch optimizations
|
||||
* Constant propagation
|
||||
Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF))
|
||||
Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #))
|
||||
Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
|
||||
* Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
|
||||
Could propagate the AND into pos/negedges and let domaining optimize.
|
||||
Negedge reset
|
||||
* Negedge reset
|
||||
Switch to remove negedges that don't matter
|
||||
Can't remove async resets from control flops (like in syncronizers)
|
||||
If all references to array have a constant index, blow up into separate signals-per-index
|
||||
Multithreaded execution
|
||||
Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
|
||||
Move _last sets and all other combo logic inside master
|
||||
* If all references to array have a constant index, blow up into separate signals-per-index
|
||||
* Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
|
||||
* Move _last sets and all other combo logic inside master
|
||||
if() that triggers on all possible sense items
|
||||
Rewrite and combine V3Life, V3Subst
|
||||
* Rewrite and combine V3Life, V3Subst
|
||||
If block temp only ever set in one place to constant, propagate it
|
||||
Used in t_mem for array delayed assignments
|
||||
Replace variables if set later in same cfunc branch
|
||||
See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop
|
||||
Same assignment on both if branches
|
||||
* Same assignment on both if branches
|
||||
"if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..."
|
||||
Careful though, as b could appear in the statement or multiple times in statement
|
||||
(Could just require exatly two 'b's in statement)
|
||||
Simplify XOR/XNOR/AND/OR bit selection trees
|
||||
* Simplify XOR/XNOR/AND/OR bit selection trees
|
||||
Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 )
|
||||
Combine variables into wider elements
|
||||
* Combine variables into wider elements
|
||||
Parallel statements on different bits should become single signal
|
||||
Variables that are always consumed in "parallel" can be joined
|
||||
Duplicate assignments in gate optimization
|
||||
* Duplicate assignments in gate optimization
|
||||
Common to have many separate posedge blocks, each with identical
|
||||
reset_r <= rst_in
|
||||
*If signal is used only once (not counting trace), always gate substitute
|
||||
* If signal is used only once (not counting trace), always gate substitute
|
||||
Don't merge if any combining would form circ logic (out goes back to in)
|
||||
Multiple assignments each bit can become single assign with concat
|
||||
* Multiple assignments each bit can become single assign with concat
|
||||
Make sure a SEL of a CONCAT can get the single bit back.
|
||||
Usually blocks/values
|
||||
* Usually blocks/values
|
||||
Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out
|
||||
Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
|
||||
Allow Split of case statements without a $display/$stop
|
||||
I-cache packing improvements (what/how?)
|
||||
Data cache organization (order of vars in class)
|
||||
* Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
|
||||
* Allow Split of case statements without a $display/$stop
|
||||
* I-cache packing improvements (what/how?)
|
||||
* Data cache organization (order of vars in class)
|
||||
First have clocks,
|
||||
then bools instead of uint32_t's
|
||||
then based on what sense list they come from, all outputs, then all inputs
|
||||
finally have any signals part of a "usually" block, or constant.
|
||||
Rather then tracking widths, have a MSB...LSB of this expression
|
||||
* Rather then tracking widths, have a MSB...LSB of this expression
|
||||
(or better, a bitmask of bits relevant in this expression)
|
||||
Track recirculation and convert into clock-enables
|
||||
Clock enables should become new clocking domains for speed
|
||||
If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
|
||||
Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
|
||||
|
||||
All of the temp vars that get set, exp pre_ vars and never feedback
|
||||
(not flops) don't need to be stored in the structs, but instead can
|
||||
be per-invocation, and even better register-colored-like to reuse
|
||||
the space. This will greatly reduce the data footprint.
|
||||
|
||||
|
||||
//**********************************************************************
|
||||
//* Eventual tristate bus Stuff allowed (old verilator)
|
||||
|
||||
1) Tristate assignments must be continuous assignments
|
||||
The RHS of a tristate assignment can be the following
|
||||
a) a node (tristate or non-tristate)
|
||||
b) a constant (must be all or no z's)
|
||||
x'b0, x'bz, x{x'bz}, x{x'b0} -> are allowed
|
||||
c) a conditional whose possible values are (a) or (b)
|
||||
|
||||
2) One can lose that fact that a node is a tristate node. This happens
|
||||
if a tristate node is assigned to a 'standard' node, or is used on
|
||||
RHS of a conditional. The following infer tristate signals:
|
||||
a) inout <SIGNAL>
|
||||
b) tri <SIGNAL>
|
||||
c) assigning to 'Z' (maybe through a conditional)
|
||||
Note: tristate-ness of an output port determined only by
|
||||
statements in the module (not the instances it calls)
|
||||
|
||||
4) Tristate variables can't be multidimensional arrays
|
||||
5) Only check tristate contention between modules (not within!)
|
||||
6) Only simple compares with 'Z' are allowed (===)
|
||||
|
||||
* Track recirculation and convert into clock-enables
|
||||
* Clock enables should become new clocking domains for speed
|
||||
* If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
|
||||
* Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
|
||||
|
183
internals.pod
183
internals.pod
@ -40,7 +40,134 @@ Modify the later visitor functions to process the new feature as needed.
|
||||
|
||||
=back
|
||||
|
||||
=head1 DEBUG OUTPUT/ TREE FILES
|
||||
=head1 CODE FLOWS
|
||||
|
||||
=head2 Verilator Flow
|
||||
|
||||
The main flow of Verilator can be followed by reading the Verilator.cpp
|
||||
process() function:
|
||||
|
||||
First, the files specified on the command line are read. Reading involves
|
||||
preprocessing, then lexical analysis with Flex and parsing with Bison.
|
||||
This produces an abstract syntax tree (AST) representation of the design,
|
||||
which is what is visible in the .tree files described below.
|
||||
|
||||
Cells are then linked, which will read and parse additional files as above.
|
||||
|
||||
Functions, variable and other references are linked to their definitions.
|
||||
|
||||
Parameters are resolved and the design is elaborated.
|
||||
|
||||
Verilator then performs many additional edits and optimizations on the
|
||||
hierarchical design. This includes coverage, assertions, X elimination,
|
||||
inlining, constant propagation, and dead code elimination.
|
||||
|
||||
References in the design are then psudo-flattened. Each module's variables
|
||||
and functions get "Scope" references. A scope reference is an occurrence of
|
||||
that un-flattened variable in the flattened hierarchy. A module that occurs
|
||||
only once in the hierarchy will have a single scope and single VarScope for
|
||||
each variable. A module that occurs twice will have a scope for each
|
||||
occurrence, and two VarScopes for each variable. This allows optimizations
|
||||
to proceed across the flattened design, while still preserving the
|
||||
hierarchy.
|
||||
|
||||
Additional edits and optimizations proceed on the psudo-flat design. These
|
||||
include module references, function inlining, loop unrolling, variable
|
||||
lifetime analysis, lookup table creation, always splitting, and logic gate
|
||||
simplifications (pushing inverters, etc).
|
||||
|
||||
Verilator orders the code. Best case, this results in a single "eval"
|
||||
function which has all always statements flowing from top to bottom with no
|
||||
loops.
|
||||
|
||||
Verilator mostly removes the flattening, so that code may be shared between
|
||||
multiple invocations of the same module. It localizes variables, combines
|
||||
identical functions, expands macros to C primitives, adds branch prediction
|
||||
hints, and performs additional constant propagation.
|
||||
|
||||
Verilator finally writes the C++ modules.
|
||||
|
||||
=head2 Verilated Flow
|
||||
|
||||
The evaluation loop outputted by Verilator is designed to allow a single
|
||||
function to perform evaluation under most situations.
|
||||
|
||||
On the first evaluation, the Verilated code calls initial blocks, and then
|
||||
"settles" the modules, by evaluating functions (from always statements)
|
||||
until all signals are stable.
|
||||
|
||||
On other evaluations, the Verilated code detects what input signals have
|
||||
changes. If any are clocks, it calls the appropriate sequential functions
|
||||
(from always @ posedge statements). Interspersed with sequential functions
|
||||
it calls combo functions (from always @*). After this is complete, it
|
||||
detects any changes due to combo loops or internally generated clocks, and
|
||||
if one is found must reevaluate the model again.
|
||||
|
||||
For SystemC code, the eval() function is wrapped in a SystemC SC_METHOD,
|
||||
sensitive to all inputs. (Ideally it would only be sensitive to clocks and
|
||||
combo inputs, but tracing requires all signals to cause evaluation, and the
|
||||
performance difference is small.)
|
||||
|
||||
If tracing is enabled, a callback examines all variables in the design for
|
||||
changes, and writes the trace for each change. To accelerate this process
|
||||
the evaluation process records a bitmask of variables that might have
|
||||
changed; if clear, checking those signals for changes may be skipped.
|
||||
|
||||
=head1 VISITOR FUNCTIONS
|
||||
|
||||
=head2 Passing Variables
|
||||
|
||||
There's three ways data is passed between visitor functions.
|
||||
|
||||
1. A visitor-class member variable. This is generally for passing "parent"
|
||||
information down to children. m_modp is a common example. It's set to
|
||||
NULL in the constructor, where that node (AstModule visitor) sets it, then
|
||||
the children are iterated, then it's cleared. Children under an AstModule
|
||||
will see it set, while nodes elsewhere will see it clear. If there can be
|
||||
nested items (for example an AstFor under an AstFor) the variable needs to
|
||||
be save-set-restored in the AstFor visitor, otherwise exiting the lower for
|
||||
will loose the upper for's setting.
|
||||
|
||||
2. User() attributes. Each node has 5 ->user() number or ->userp() pointer
|
||||
utility values (a common technique lifted from graph traversal packages).
|
||||
A visitor first clears the one it wants to use by calling
|
||||
AstNode::user#ClearTree(), then it can mark any node's user() with whatever
|
||||
data it wants. Readers just call nodep->user(), but may need to cast
|
||||
appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the
|
||||
top of each visitor are comments describing how the user() stuff applies to
|
||||
that visitor class. For example:
|
||||
|
||||
// NODE STATE
|
||||
// Cleared entire netlist
|
||||
// AstModule::user1p() // bool. True to inline this module
|
||||
|
||||
This says that at the AstNetlist user1ClearTree() is called. Each
|
||||
AstModule's is user1() is used to indicate if we're going to inline it.
|
||||
|
||||
These comments are important to make sure a user#() on a given AstNode type
|
||||
is never being used for two different purposes.
|
||||
|
||||
Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
|
||||
ok to call fairly often. For example, it's commonly called on every
|
||||
module.
|
||||
|
||||
3. Parameters can be passed between the visitors in close to the "normal"
|
||||
function caller to callee way. This is the second "vup" parameter that is
|
||||
ignored on most of the visitor functions. V3Width does this, but it proved
|
||||
more messy than the above and is deprecated. (V3Width was nearly the first
|
||||
module written. Someday this scheme may be removed, as it slows the
|
||||
program down to have to pass vup everywhere.)
|
||||
|
||||
=head1 TESTING
|
||||
|
||||
To write a test see notes in the forum and in the verilator.txt manual.
|
||||
|
||||
Note you can run the regression tests in parallel; see the
|
||||
test_regress/driver.pl script -j flag.
|
||||
|
||||
=head1 DEBUGGING
|
||||
|
||||
=head2 --debug
|
||||
|
||||
When you run with --debug there are two primary output file types placed into
|
||||
the obj_dir, .tree and .dot files.
|
||||
@ -94,59 +221,7 @@ variable is an output.
|
||||
|
||||
=back
|
||||
|
||||
=head1 TESTING
|
||||
|
||||
To write a test see notes in the forum and in the verilator.txt manual.
|
||||
|
||||
Note you can run the regression tests in parallel; see the
|
||||
test_regress/driver.pl script -j flag.
|
||||
|
||||
=head1 VISITOR FUNCTIONS
|
||||
|
||||
=head2 Passing Variables
|
||||
|
||||
There's three ways data is passed between visitor functions.
|
||||
|
||||
1. A visitor-class member variable. This is generally for passing "parent"
|
||||
information down to children. m_modp is a common example. It's set to
|
||||
NULL in the constructor, where that node (AstModule visitor) sets it, then
|
||||
the children are iterated, then it's cleared. Children under an AstModule
|
||||
will see it set, while nodes elsewhere will see it clear. If there can be
|
||||
nested items (for example an AstFor under an AstFor) the variable needs to
|
||||
be save-set-restored in the AstFor visitor, otherwise exiting the lower for
|
||||
will loose the upper for's setting.
|
||||
|
||||
2. User() attributes. Each node has 5 ->user() number or ->userp() pointer
|
||||
utility values (a common technique lifted from graph traversal packages).
|
||||
A visitor first clears the one it wants to use by calling
|
||||
AstNode::user#ClearTree(), then it can mark any node's user() with whatever
|
||||
data it wants. Readers just call nodep->user(), but may need to cast
|
||||
appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the
|
||||
top of each visitor are comments describing how the user() stuff applies to
|
||||
that visitor class. For example:
|
||||
|
||||
// NODE STATE
|
||||
// Cleared entire netlist
|
||||
// AstModule::user1p() // bool. True to inline this module
|
||||
|
||||
This says that at the AstNetlist user1ClearTree() is called. Each
|
||||
AstModule's is user1() is used to indicate if we're going to inline it.
|
||||
|
||||
These comments are important to make sure a user#() on a given AstNode type
|
||||
is never being used for two different purposes.
|
||||
|
||||
Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
|
||||
ok to call fairly often. For example, it's commonly called on every
|
||||
module.
|
||||
|
||||
3. Parameters can be passed between the visitors in close to the "normal"
|
||||
function caller to callee way. This is the second "vup" parameter that is
|
||||
ignored on most of the visitor functions. V3Width does this, but it proved
|
||||
more messy than the above and is deprecated. (V3Width was nearly the first
|
||||
module written. Someday this scheme may be removed, as it slows the
|
||||
program down to have to pass vup everywhere.)
|
||||
|
||||
=head1 DEBUGGING WITH GDB
|
||||
=head2 Debugging with GDB
|
||||
|
||||
The test_regress/driver.pl script accepts --debug --gdb to start Verilator
|
||||
under gdb. You can also use --debug --gdbbt to just backtrace and then
|
||||
|
Loading…
Reference in New Issue
Block a user