Commentary

This commit is contained in:
Wilson Snyder 2010-02-10 08:50:41 -05:00
parent 63f30492be
commit 48603c0ee2
2 changed files with 223 additions and 143 deletions

183
TODO
View File

@ -6,129 +6,134 @@
// Version 2.0.
Features:
Latch optimizations {Need here}
Task I/Os connecting to non-simple variables.
Fix ordering of each bit separately in a signal (mips)
Language support:
* Fix ordering of each bit separately in a signal (mips)
assign b[3:0] = b[7:4]; assign b[7:4] = in;
Support gate primitives/ cell libraries from xilinx, etc
Assign dont_care value to an 1'bzzz assignment
Function to eval combo logic after /*verilator public*/ functions [gwaters]
Support generated clocks (correctness)
?gcov coverage
Selectable SystemC types based on widths (see notes below)
Coverage
Points should be per-scope like everything else rather then per-module
Expression coverage (see notes)
Constant functions for widths, etc, IE "input [log2(PARAM):0] xx;"
More Verilog 2001 Support
(* *) Attributes (just ignore -- preprocessor?)
Real numbers (NEVER)
Recursive functions (NEVER)
Verilog configuration files (NEVER)
DPI to define C/C++ calls from Verilog
* Support UDP gate primitives/ cell libraries
(have code for combos - problem is sequential udps)
* Function to eval combo logic after /*verilator public*/ functions [gwaters]
* Support generated clocks (correctness)
* Real numbers
* Recursive functions
* Verilog configuration files
* Structs/unions (have starting point)
* DPI to define C/C++ calls from Verilog
* Expression coverage (see notes)
* Better tristate support
Long-term Features
Assertions
VHDL parser [Philips]
Tristate support
SystemPerl integration
Multithreaded execution
* Assertions
* Tristate support
* Multithreaded execution
Configure/Make/Install
* Full MSVC++ compilation (does scons support this?) (4.000?)
* Distribute with flex/bison already expanded?
Flex library not needed. Probably too difficult to be worth it.
* Integrate SystemPerl coverage
(Note in /usr/include there are no upper cased include files.)
Coverage.pm -- Need all functionality, but in C?
Coverage/Item.pm -- Need all functionality, but in C?
Coverage/ItemKey.pm -- Need all functionality, but in C?
sp_preproc -- Some steps in here need to be moved to generated C
src/Sp.cpp -- n/a
src/SpCommon.h -- mostly overlaps verilatedos.h
src/SpCoverage.cpp/h -- All needed
src/SpFunctor.cpp/h -- No longer used
src/SpTraceVcd.cpp/h -- MOVED
src/SpTraceVcdC.cpp/h -- MOVED
src/sp_log.cpp/h -- Not needed
src/systemperl.h -- some stuff may be cut
vcoverage -- Need all functionality, but in C?
Testing:
Capture all inputs into global "rerun it" file
Code to make wrapper that sets signals, so can do comparison checks
New random program generator
Better graph viewer with search and zoom
Port and test against opencores.org code
* Move test_c/sp/v/verilated into test_regress format (4.000?)
* Capture all inputs into global "rerun it" file
* Code to make wrapper that sets signals, so can do comparison checks
* New random program generator
* Better graph viewer with search and zoom
* Port and test against opencores.org code
Usability:
Better reporting of unopt problems, including what lines of code
Report more errors (all of them?) before exiting [Eugene Weber]
* Detect and pre-remove most UNOPTFLATs (4.000)
* Better reporting of unopt problems, including what lines of code
* Report more errors (all of them?) before exiting [Eugene Weber]
* Auto-create scons config files
* Print version/etc message at runtime. (4.000?)
Include number of lines of code, percent comments, code complexity measurement
<-80chars------------------------------------------------------------------->
Verilator 3.600 - fast, free, open-sourced. Copyright 2001-2010.
Verilated #### modules, #### instances, ##### sigs,
#### non-comment lines, ##### ops, ### KB model size
* Default the --l2name to remove extra "v" level of hierarchy (flag to make "top")
Internal Code:
Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
and removing it will speed up and simplify all the other code.
V3Graph should be templated container type, taking in Vertex + Edge types
* Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
and removing it will speed up and simplify all the other code.
* V3Graph should be templated container type, taking in Vertex + Edge types
* Rename V3PreLex etc to match VerilogPerl filenames
* Instead of string, have an VEncodedString/VIdString which contains __DOT__ish
things, to reduce bugs. Also add _20 trailing space to \ encoded names. (4.000)
Runtime:
* New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?)
* Remove all private internal functions from top level wrapper header, move
to new level (4.000?)
* Completely standalone simulation (4.000)
main() records arguments for $test$plusvars
instantiates top,
does tracing (support $dump?)
calls top->simulateForever()
exits
Performance:
Constant propagation
* Latch optimizations
* Constant propagation
Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF))
Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #))
Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
* Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
Could propagate the AND into pos/negedges and let domaining optimize.
Negedge reset
* Negedge reset
Switch to remove negedges that don't matter
Can't remove async resets from control flops (like in syncronizers)
If all references to array have a constant index, blow up into separate signals-per-index
Multithreaded execution
Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
Move _last sets and all other combo logic inside master
* If all references to array have a constant index, blow up into separate signals-per-index
* Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
* Move _last sets and all other combo logic inside master
if() that triggers on all possible sense items
Rewrite and combine V3Life, V3Subst
* Rewrite and combine V3Life, V3Subst
If block temp only ever set in one place to constant, propagate it
Used in t_mem for array delayed assignments
Replace variables if set later in same cfunc branch
See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop
Same assignment on both if branches
* Same assignment on both if branches
"if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..."
Careful though, as b could appear in the statement or multiple times in statement
(Could just require exatly two 'b's in statement)
Simplify XOR/XNOR/AND/OR bit selection trees
* Simplify XOR/XNOR/AND/OR bit selection trees
Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 )
Combine variables into wider elements
* Combine variables into wider elements
Parallel statements on different bits should become single signal
Variables that are always consumed in "parallel" can be joined
Duplicate assignments in gate optimization
* Duplicate assignments in gate optimization
Common to have many separate posedge blocks, each with identical
reset_r <= rst_in
*If signal is used only once (not counting trace), always gate substitute
* If signal is used only once (not counting trace), always gate substitute
Don't merge if any combining would form circ logic (out goes back to in)
Multiple assignments each bit can become single assign with concat
* Multiple assignments each bit can become single assign with concat
Make sure a SEL of a CONCAT can get the single bit back.
Usually blocks/values
* Usually blocks/values
Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out
Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
Allow Split of case statements without a $display/$stop
I-cache packing improvements (what/how?)
Data cache organization (order of vars in class)
* Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
* Allow Split of case statements without a $display/$stop
* I-cache packing improvements (what/how?)
* Data cache organization (order of vars in class)
First have clocks,
then bools instead of uint32_t's
then based on what sense list they come from, all outputs, then all inputs
finally have any signals part of a "usually" block, or constant.
Rather then tracking widths, have a MSB...LSB of this expression
* Rather then tracking widths, have a MSB...LSB of this expression
(or better, a bitmask of bits relevant in this expression)
Track recirculation and convert into clock-enables
Clock enables should become new clocking domains for speed
If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
All of the temp vars that get set, exp pre_ vars and never feedback
(not flops) don't need to be stored in the structs, but instead can
be per-invocation, and even better register-colored-like to reuse
the space. This will greatly reduce the data footprint.
//**********************************************************************
//* Eventual tristate bus Stuff allowed (old verilator)
1) Tristate assignments must be continuous assignments
The RHS of a tristate assignment can be the following
a) a node (tristate or non-tristate)
b) a constant (must be all or no z's)
x'b0, x'bz, x{x'bz}, x{x'b0} -> are allowed
c) a conditional whose possible values are (a) or (b)
2) One can lose that fact that a node is a tristate node. This happens
if a tristate node is assigned to a 'standard' node, or is used on
RHS of a conditional. The following infer tristate signals:
a) inout <SIGNAL>
b) tri <SIGNAL>
c) assigning to 'Z' (maybe through a conditional)
Note: tristate-ness of an output port determined only by
statements in the module (not the instances it calls)
4) Tristate variables can't be multidimensional arrays
5) Only check tristate contention between modules (not within!)
6) Only simple compares with 'Z' are allowed (===)
* Track recirculation and convert into clock-enables
* Clock enables should become new clocking domains for speed
* If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
* Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)

View File

@ -40,7 +40,134 @@ Modify the later visitor functions to process the new feature as needed.
=back
=head1 DEBUG OUTPUT/ TREE FILES
=head1 CODE FLOWS
=head2 Verilator Flow
The main flow of Verilator can be followed by reading the Verilator.cpp
process() function:
First, the files specified on the command line are read. Reading involves
preprocessing, then lexical analysis with Flex and parsing with Bison.
This produces an abstract syntax tree (AST) representation of the design,
which is what is visible in the .tree files described below.
Cells are then linked, which will read and parse additional files as above.
Functions, variable and other references are linked to their definitions.
Parameters are resolved and the design is elaborated.
Verilator then performs many additional edits and optimizations on the
hierarchical design. This includes coverage, assertions, X elimination,
inlining, constant propagation, and dead code elimination.
References in the design are then psudo-flattened. Each module's variables
and functions get "Scope" references. A scope reference is an occurrence of
that un-flattened variable in the flattened hierarchy. A module that occurs
only once in the hierarchy will have a single scope and single VarScope for
each variable. A module that occurs twice will have a scope for each
occurrence, and two VarScopes for each variable. This allows optimizations
to proceed across the flattened design, while still preserving the
hierarchy.
Additional edits and optimizations proceed on the psudo-flat design. These
include module references, function inlining, loop unrolling, variable
lifetime analysis, lookup table creation, always splitting, and logic gate
simplifications (pushing inverters, etc).
Verilator orders the code. Best case, this results in a single "eval"
function which has all always statements flowing from top to bottom with no
loops.
Verilator mostly removes the flattening, so that code may be shared between
multiple invocations of the same module. It localizes variables, combines
identical functions, expands macros to C primitives, adds branch prediction
hints, and performs additional constant propagation.
Verilator finally writes the C++ modules.
=head2 Verilated Flow
The evaluation loop outputted by Verilator is designed to allow a single
function to perform evaluation under most situations.
On the first evaluation, the Verilated code calls initial blocks, and then
"settles" the modules, by evaluating functions (from always statements)
until all signals are stable.
On other evaluations, the Verilated code detects what input signals have
changes. If any are clocks, it calls the appropriate sequential functions
(from always @ posedge statements). Interspersed with sequential functions
it calls combo functions (from always @*). After this is complete, it
detects any changes due to combo loops or internally generated clocks, and
if one is found must reevaluate the model again.
For SystemC code, the eval() function is wrapped in a SystemC SC_METHOD,
sensitive to all inputs. (Ideally it would only be sensitive to clocks and
combo inputs, but tracing requires all signals to cause evaluation, and the
performance difference is small.)
If tracing is enabled, a callback examines all variables in the design for
changes, and writes the trace for each change. To accelerate this process
the evaluation process records a bitmask of variables that might have
changed; if clear, checking those signals for changes may be skipped.
=head1 VISITOR FUNCTIONS
=head2 Passing Variables
There's three ways data is passed between visitor functions.
1. A visitor-class member variable. This is generally for passing "parent"
information down to children. m_modp is a common example. It's set to
NULL in the constructor, where that node (AstModule visitor) sets it, then
the children are iterated, then it's cleared. Children under an AstModule
will see it set, while nodes elsewhere will see it clear. If there can be
nested items (for example an AstFor under an AstFor) the variable needs to
be save-set-restored in the AstFor visitor, otherwise exiting the lower for
will loose the upper for's setting.
2. User() attributes. Each node has 5 ->user() number or ->userp() pointer
utility values (a common technique lifted from graph traversal packages).
A visitor first clears the one it wants to use by calling
AstNode::user#ClearTree(), then it can mark any node's user() with whatever
data it wants. Readers just call nodep->user(), but may need to cast
appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the
top of each visitor are comments describing how the user() stuff applies to
that visitor class. For example:
// NODE STATE
// Cleared entire netlist
// AstModule::user1p() // bool. True to inline this module
This says that at the AstNetlist user1ClearTree() is called. Each
AstModule's is user1() is used to indicate if we're going to inline it.
These comments are important to make sure a user#() on a given AstNode type
is never being used for two different purposes.
Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
ok to call fairly often. For example, it's commonly called on every
module.
3. Parameters can be passed between the visitors in close to the "normal"
function caller to callee way. This is the second "vup" parameter that is
ignored on most of the visitor functions. V3Width does this, but it proved
more messy than the above and is deprecated. (V3Width was nearly the first
module written. Someday this scheme may be removed, as it slows the
program down to have to pass vup everywhere.)
=head1 TESTING
To write a test see notes in the forum and in the verilator.txt manual.
Note you can run the regression tests in parallel; see the
test_regress/driver.pl script -j flag.
=head1 DEBUGGING
=head2 --debug
When you run with --debug there are two primary output file types placed into
the obj_dir, .tree and .dot files.
@ -94,59 +221,7 @@ variable is an output.
=back
=head1 TESTING
To write a test see notes in the forum and in the verilator.txt manual.
Note you can run the regression tests in parallel; see the
test_regress/driver.pl script -j flag.
=head1 VISITOR FUNCTIONS
=head2 Passing Variables
There's three ways data is passed between visitor functions.
1. A visitor-class member variable. This is generally for passing "parent"
information down to children. m_modp is a common example. It's set to
NULL in the constructor, where that node (AstModule visitor) sets it, then
the children are iterated, then it's cleared. Children under an AstModule
will see it set, while nodes elsewhere will see it clear. If there can be
nested items (for example an AstFor under an AstFor) the variable needs to
be save-set-restored in the AstFor visitor, otherwise exiting the lower for
will loose the upper for's setting.
2. User() attributes. Each node has 5 ->user() number or ->userp() pointer
utility values (a common technique lifted from graph traversal packages).
A visitor first clears the one it wants to use by calling
AstNode::user#ClearTree(), then it can mark any node's user() with whatever
data it wants. Readers just call nodep->user(), but may need to cast
appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the
top of each visitor are comments describing how the user() stuff applies to
that visitor class. For example:
// NODE STATE
// Cleared entire netlist
// AstModule::user1p() // bool. True to inline this module
This says that at the AstNetlist user1ClearTree() is called. Each
AstModule's is user1() is used to indicate if we're going to inline it.
These comments are important to make sure a user#() on a given AstNode type
is never being used for two different purposes.
Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
ok to call fairly often. For example, it's commonly called on every
module.
3. Parameters can be passed between the visitors in close to the "normal"
function caller to callee way. This is the second "vup" parameter that is
ignored on most of the visitor functions. V3Width does this, but it proved
more messy than the above and is deprecated. (V3Width was nearly the first
module written. Someday this scheme may be removed, as it slows the
program down to have to pass vup everywhere.)
=head1 DEBUGGING WITH GDB
=head2 Debugging with GDB
The test_regress/driver.pl script accepts --debug --gdb to start Verilator
under gdb. You can also use --debug --gdbbt to just backtrace and then