Commentary

2010-02-10 08:50:41 -05:00 · 2010-02-10 08:50:41 -05:00 · 48603c0ee2
commit 48603c0ee2
parent 63f30492be
2 changed files with 223 additions and 143 deletions
--- a/183
+++ b/183
@ -6,129 +6,134 @@
 // Version 2.0.


-Features:
-	Latch optimizations {Need here}
-	Task I/Os connecting to non-simple variables.
-	Fix ordering of each bit separately in a signal (mips)
+Language support:
+	* Fix ordering of each bit separately in a signal (mips)
 		assign b[3:0] = b[7:4];  assign b[7:4] = in;
-	Support gate primitives/ cell libraries from xilinx, etc
-	Assign dont_care value to an 1'bzzz assignment
-	Function to eval combo logic after /*verilator public*/ functions [gwaters]
-	Support generated clocks (correctness)
-	?gcov coverage
-	Selectable SystemC types based on widths (see notes below)
-	Coverage
-		Points should be per-scope like everything else rather then per-module
-		Expression coverage (see notes)
-	Constant functions for widths, etc, IE   "input [log2(PARAM):0] xx;"
-	More Verilog 2001 Support
-		(* *) Attributes  (just ignore -- preprocessor?)
-		Real numbers (NEVER)
-		Recursive functions (NEVER)
-		Verilog configuration files (NEVER)
-	DPI to define C/C++ calls from Verilog
+	* Support UDP gate primitives/ cell libraries
+		(have code for combos - problem is sequential udps)
+	* Function to eval combo logic after /*verilator public*/ functions [gwaters]
+	* Support generated clocks (correctness)
+	* Real numbers
+	* Recursive functions
+	* Verilog configuration files
+	* Structs/unions (have starting point)
+	* DPI to define C/C++ calls from Verilog
+	* Expression coverage (see notes)
+	* Better tristate support

 Long-term Features
-	Assertions
-	VHDL parser  [Philips]
-	Tristate support
-	SystemPerl integration
-	Multithreaded execution
+	* Assertions
+	* Tristate support
+	* Multithreaded execution
+
+Configure/Make/Install
+	* Full MSVC++ compilation (does scons support this?) (4.000?)
+	* Distribute with flex/bison already expanded?
+	  Flex library not needed.  Probably too difficult to be worth it.
+	* Integrate SystemPerl coverage
+    	      (Note in /usr/include there are no upper cased include files.)
+		Coverage.pm		-- Need all functionality, but in C?
+		Coverage/Item.pm	-- Need all functionality, but in C?
+		Coverage/ItemKey.pm	-- Need all functionality, but in C?
+		sp_preproc		-- Some steps in here need to be moved to generated C
+		src/Sp.cpp		-- n/a
+		src/SpCommon.h		-- mostly overlaps verilatedos.h
+		src/SpCoverage.cpp/h	-- All needed
+		src/SpFunctor.cpp/h	-- No longer used
+		src/SpTraceVcd.cpp/h	-- MOVED
+		src/SpTraceVcdC.cpp/h	-- MOVED
+		src/sp_log.cpp/h	-- Not needed
+		src/systemperl.h	-- some stuff may be cut
+		vcoverage		-- Need all functionality, but in C?

 Testing:
-	Capture all inputs into global "rerun it" file
-	Code to make wrapper that sets signals, so can do comparison checks
-	New random program generator
-	Better graph viewer with search and zoom
-	Port and test against opencores.org code
+	* Move test_c/sp/v/verilated into test_regress format (4.000?)
+	* Capture all inputs into global "rerun it" file
+	* Code to make wrapper that sets signals, so can do comparison checks
+	* New random program generator
+	* Better graph viewer with search and zoom
+	* Port and test against opencores.org code

 Usability:
-	Better reporting of unopt problems, including what lines of code
-	Report more errors (all of them?) before exiting [Eugene Weber]
+	* Detect and pre-remove most UNOPTFLATs (4.000) 
+	* Better reporting of unopt problems, including what lines of code
+	* Report more errors (all of them?) before exiting [Eugene Weber]
+	* Auto-create scons config files
+	* Print version/etc message at runtime. (4.000?)
+	  Include number of lines of code, percent comments, code complexity measurement
+	  <-80chars------------------------------------------------------------------->
+	  Verilator 3.600 - fast, free, open-sourced.  Copyright 2001-2010.
+	  Verilated #### modules, #### instances, ##### sigs,
+	  	    #### non-comment lines, ##### ops, ### KB model size
+	* Default the --l2name to remove extra "v" level of hierarchy (flag to make "top")

 Internal Code:
-	Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
-	and removing it will speed up and simplify all the other code.
-	V3Graph should be templated container type, taking in Vertex + Edge types
+	* Eliminate the AstNUser* passed to all visitors; its only needed in V3Width,
+	  and removing it will speed up and simplify all the other code.
+	* V3Graph should be templated container type, taking in Vertex + Edge types
+	* Rename V3PreLex etc to match VerilogPerl filenames
+	* Instead of string, have an VEncodedString/VIdString which contains __DOT__ish
+	  things, to reduce bugs.  Also add _20 trailing space to \ encoded names. (4.000)
+
+Runtime:
+	* New evalulation loop   ~/src/verilator/notes/event_loop.txt (4.000?)
+	* Remove all private internal functions from top level wrapper header, move
+	  to new level (4.000?)
+	* Completely standalone simulation (4.000)
+	   main() records arguments for $test$plusvars
+	   instantiates top,
+	   does tracing  (support $dump?)
+	   calls top->simulateForever()
+	   exits

 Performance:
-	Constant propagation
+	* Latch optimizations
+	* Constant propagation
 		Extra cleaning AND:  1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF))
 		Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #))
-	Gated clock and latch conversion to flops.  [JeanPaul Vanitegem]
+	* Gated clock and latch conversion to flops.  [JeanPaul Vanitegem]
 		Could propagate the AND into pos/negedges and let domaining optimize.
-	Negedge reset
+	* Negedge reset
 		Switch to remove negedges that don't matter
 		Can't remove async resets from control flops (like in syncronizers)
-	If all references to array have a constant index, blow up into separate signals-per-index
-	Multithreaded execution
-	Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
-	Move _last sets and all other combo logic inside master
+	* If all references to array have a constant index, blow up into separate signals-per-index
+	* Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
+	* Move _last sets and all other combo logic inside master
 		if() that triggers on all possible sense items
-	Rewrite and combine V3Life, V3Subst
+	* Rewrite and combine V3Life, V3Subst
 		If block temp only ever set in one place to constant, propagate it
 			Used in t_mem for array delayed assignments
 		Replace variables if set later in same cfunc branch
 			See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop
-	Same assignment on both if branches
+	* Same assignment on both if branches
 		"if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..."
 		Careful though, as b could appear in the statement or multiple times in statement
 		(Could just require exatly two 'b's in statement)
-	Simplify XOR/XNOR/AND/OR bit selection trees
+	* Simplify XOR/XNOR/AND/OR bit selection trees
 		Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 )
-	Combine variables into wider elements
+	* Combine variables into wider elements
 		Parallel statements on different bits should become single signal
 		Variables that are always consumed in "parallel" can be joined
-	Duplicate assignments in gate optimization
+	* Duplicate assignments in gate optimization
 		Common to have many separate posedge blocks, each with identical
 		reset_r <= rst_in
-	*If signal is used only once (not counting trace), always gate substitute
+	* If signal is used only once (not counting trace), always gate substitute
 		Don't merge if any combining would form circ logic (out goes back to in)
-	Multiple assignments each bit can become single assign with concat
+	* Multiple assignments each bit can become single assign with concat
 		Make sure a SEL of a CONCAT can get the single bit back.
-	Usually blocks/values
+	* Usually blocks/values
 		Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out
-	Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
-	Allow Split of case statements without a $display/$stop
-	I-cache packing improvements (what/how?)
-	Data cache organization (order of vars in class)
+	* Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
+	* Allow Split of case statements without a $display/$stop
+	* I-cache packing improvements (what/how?)
+	* Data cache organization (order of vars in class)
 		First have clocks,
 		then bools instead of uint32_t's
 		then based on what sense list they come from, all outputs, then all inputs
 		finally have any signals part of a "usually" block, or constant.
-	Rather then tracking widths, have a MSB...LSB of this expression
+	* Rather then tracking widths, have a MSB...LSB of this expression
 		(or better, a bitmask of bits relevant in this expression)
-	Track recirculation and convert into clock-enables
-	Clock enables should become new clocking domains for speed
-	If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
-	Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
-
-	All of the temp vars that get set, exp pre_ vars and never feedback
-	(not flops) don't need to be stored in the structs, but instead can
-	be per-invocation, and even better register-colored-like to reuse
-	the space.  This will greatly reduce the data footprint.
-
-
-//**********************************************************************
-//* Eventual tristate bus Stuff allowed (old verilator)
-
- 1) Tristate assignments must be continuous assignments
-    The RHS of a tristate assignment can be the following
-       a) a node (tristate or non-tristate)
-       b) a constant (must be all or no z's)
-	    x'b0, x'bz, x{x'bz}, x{x'b0} -> are allowed
-       c) a conditional whose possible values are (a) or (b)
-
- 2) One can lose that fact that a node is a tristate node.  This happens
-    if a tristate node is assigned to a 'standard' node, or is used on
-    RHS of a conditional. The following infer tristate signals:
-       a) inout <SIGNAL>
-       b) tri <SIGNAL>
-       c) assigning to 'Z' (maybe through a conditional)
-    Note: tristate-ness of an output port determined only by
-          statements in the module (not the instances it calls)
-
- 4) Tristate variables can't be multidimensional arrays
- 5) Only check tristate contention between modules (not within!)
- 6) Only simple compares with 'Z' are allowed (===)
-
+	* Track recirculation and convert into clock-enables
+	* Clock enables should become new clocking domains for speed
+	* If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
+	* Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
--- a/internals.pod
+++ b/internals.pod
@ -40,7 +40,134 @@ Modify the later visitor functions to process the new feature as needed.

 =back

-=head1 DEBUG OUTPUT/ TREE FILES
+=head1 CODE FLOWS
+
+=head2 Verilator Flow
+
+The main flow of Verilator can be followed by reading the Verilator.cpp
+process() function:
+
+First, the files specified on the command line are read.  Reading involves
+preprocessing, then lexical analysis with Flex and parsing with Bison.
+This produces an abstract syntax tree (AST) representation of the design,
+which is what is visible in the .tree files described below.
+
+Cells are then linked, which will read and parse additional files as above.
+
+Functions, variable and other references are linked to their definitions.
+
+Parameters are resolved and the design is elaborated.
+
+Verilator then performs many additional edits and optimizations on the
+hierarchical design.  This includes coverage, assertions, X elimination,
+inlining, constant propagation, and dead code elimination.
+
+References in the design are then psudo-flattened.  Each module's variables
+and functions get "Scope" references.  A scope reference is an occurrence of
+that un-flattened variable in the flattened hierarchy.  A module that occurs
+only once in the hierarchy will have a single scope and single VarScope for
+each variable.  A module that occurs twice will have a scope for each
+occurrence, and two VarScopes for each variable.  This allows optimizations
+to proceed across the flattened design, while still preserving the
+hierarchy.
+
+Additional edits and optimizations proceed on the psudo-flat design.  These
+include module references, function inlining, loop unrolling, variable
+lifetime analysis, lookup table creation, always splitting, and logic gate
+simplifications (pushing inverters, etc).
+
+Verilator orders the code.  Best case, this results in a single "eval"
+function which has all always statements flowing from top to bottom with no
+loops.
+
+Verilator mostly removes the flattening, so that code may be shared between
+multiple invocations of the same module.  It localizes variables, combines
+identical functions, expands macros to C primitives, adds branch prediction
+hints, and performs additional constant propagation.
+
+Verilator finally writes the C++ modules.
+
+=head2 Verilated Flow
+
+The evaluation loop outputted by Verilator is designed to allow a single
+function to perform evaluation under most situations.
+
+On the first evaluation, the Verilated code calls initial blocks, and then
+"settles" the modules, by evaluating functions (from always statements)
+until all signals are stable.
+
+On other evaluations, the Verilated code detects what input signals have
+changes.  If any are clocks, it calls the appropriate sequential functions
+(from always @ posedge statements).  Interspersed with sequential functions
+it calls combo functions (from always @*).  After this is complete, it
+detects any changes due to combo loops or internally generated clocks, and
+if one is found must reevaluate the model again.
+
+For SystemC code, the eval() function is wrapped in a SystemC SC_METHOD,
+sensitive to all inputs.  (Ideally it would only be sensitive to clocks and
+combo inputs, but tracing requires all signals to cause evaluation, and the
+performance difference is small.)
+
+If tracing is enabled, a callback examines all variables in the design for
+changes, and writes the trace for each change.  To accelerate this process
+the evaluation process records a bitmask of variables that might have
+changed; if clear, checking those signals for changes may be skipped.
+
+=head1 VISITOR FUNCTIONS
+
+=head2 Passing Variables
+
+There's three ways data is passed between visitor functions.
+
+1. A visitor-class member variable.  This is generally for passing "parent"
+information down to children.  m_modp is a common example.  It's set to
+NULL in the constructor, where that node (AstModule visitor) sets it, then
+the children are iterated, then it's cleared.  Children under an AstModule
+will see it set, while nodes elsewhere will see it clear.  If there can be
+nested items (for example an AstFor under an AstFor) the variable needs to
+be save-set-restored in the AstFor visitor, otherwise exiting the lower for
+will loose the upper for's setting.
+
+2. User() attributes.  Each node has 5 ->user() number or ->userp() pointer
+utility values (a common technique lifted from graph traversal packages).
+A visitor first clears the one it wants to use by calling
+AstNode::user#ClearTree(), then it can mark any node's user() with whatever
+data it wants.  Readers just call nodep->user(), but may need to cast
+appropriately, so you'll often see nodep->userp()->castSOMETYPE().  At the
+top of each visitor are comments describing how the user() stuff applies to
+that visitor class.  For example:
+
+    // NODE STATE
+    // Cleared entire netlist
+    //   AstModule::user1p()     // bool. True to inline this module
+
+This says that at the AstNetlist user1ClearTree() is called.  Each
+AstModule's is user1() is used to indicate if we're going to inline it.
+
+These comments are important to make sure a user#() on a given AstNode type
+is never being used for two different purposes.
+
+Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
+ok to call fairly often.  For example, it's commonly called on every
+module.
+
+3. Parameters can be passed between the visitors in close to the "normal"
+function caller to callee way.  This is the second "vup" parameter that is
+ignored on most of the visitor functions.  V3Width does this, but it proved
+more messy than the above and is deprecated.  (V3Width was nearly the first
+module written.  Someday this scheme may be removed, as it slows the
+program down to have to pass vup everywhere.)
+
+=head1 TESTING
+
+To write a test see notes in the forum and in the verilator.txt manual.
+
+Note you can run the regression tests in parallel; see the
+test_regress/driver.pl script -j flag.
+
+=head1 DEBUGGING
+
+=head2 --debug

 When you run with --debug there are two primary output file types placed into
 the obj_dir, .tree and .dot files.
@ -94,59 +221,7 @@ variable is an output.

 =back

-=head1 TESTING
-
-To write a test see notes in the forum and in the verilator.txt manual.
-
-Note you can run the regression tests in parallel; see the
-test_regress/driver.pl script -j flag.
-
-=head1 VISITOR FUNCTIONS
-
-=head2 Passing Variables
-
-There's three ways data is passed between visitor functions.
-
-1. A visitor-class member variable.  This is generally for passing "parent"
-information down to children.  m_modp is a common example.  It's set to
-NULL in the constructor, where that node (AstModule visitor) sets it, then
-the children are iterated, then it's cleared.  Children under an AstModule
-will see it set, while nodes elsewhere will see it clear.  If there can be
-nested items (for example an AstFor under an AstFor) the variable needs to
-be save-set-restored in the AstFor visitor, otherwise exiting the lower for
-will loose the upper for's setting.
-
-2. User() attributes.  Each node has 5 ->user() number or ->userp() pointer
-utility values (a common technique lifted from graph traversal packages).
-A visitor first clears the one it wants to use by calling
-AstNode::user#ClearTree(), then it can mark any node's user() with whatever
-data it wants.  Readers just call nodep->user(), but may need to cast
-appropriately, so you'll often see nodep->userp()->castSOMETYPE().  At the
-top of each visitor are comments describing how the user() stuff applies to
-that visitor class.  For example:
-
-    // NODE STATE
-    // Cleared entire netlist
-    //   AstModule::user1p()     // bool. True to inline this module
-
-This says that at the AstNetlist user1ClearTree() is called.  Each
-AstModule's is user1() is used to indicate if we're going to inline it.
-
-These comments are important to make sure a user#() on a given AstNode type
-is never being used for two different purposes.
-
-Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's
-ok to call fairly often.  For example, it's commonly called on every
-module.
-
-3. Parameters can be passed between the visitors in close to the "normal"
-function caller to callee way.  This is the second "vup" parameter that is
-ignored on most of the visitor functions.  V3Width does this, but it proved
-more messy than the above and is deprecated.  (V3Width was nearly the first
-module written.  Someday this scheme may be removed, as it slows the
-program down to have to pass vup everywhere.)
-
-=head1 DEBUGGING WITH GDB
+=head2 Debugging with GDB

 The test_regress/driver.pl script accepts --debug --gdb to start Verilator
 under gdb.  You can also use --debug --gdbbt to just backtrace and then