diff --git a/TODO b/TODO index 6c9fb70c7..6a01700b7 100644 --- a/TODO +++ b/TODO @@ -6,129 +6,134 @@ // Version 2.0. -Features: - Latch optimizations {Need here} - Task I/Os connecting to non-simple variables. - Fix ordering of each bit separately in a signal (mips) +Language support: + * Fix ordering of each bit separately in a signal (mips) assign b[3:0] = b[7:4]; assign b[7:4] = in; - Support gate primitives/ cell libraries from xilinx, etc - Assign dont_care value to an 1'bzzz assignment - Function to eval combo logic after /*verilator public*/ functions [gwaters] - Support generated clocks (correctness) - ?gcov coverage - Selectable SystemC types based on widths (see notes below) - Coverage - Points should be per-scope like everything else rather then per-module - Expression coverage (see notes) - Constant functions for widths, etc, IE "input [log2(PARAM):0] xx;" - More Verilog 2001 Support - (* *) Attributes (just ignore -- preprocessor?) - Real numbers (NEVER) - Recursive functions (NEVER) - Verilog configuration files (NEVER) - DPI to define C/C++ calls from Verilog + * Support UDP gate primitives/ cell libraries + (have code for combos - problem is sequential udps) + * Function to eval combo logic after /*verilator public*/ functions [gwaters] + * Support generated clocks (correctness) + * Real numbers + * Recursive functions + * Verilog configuration files + * Structs/unions (have starting point) + * DPI to define C/C++ calls from Verilog + * Expression coverage (see notes) + * Better tristate support Long-term Features - Assertions - VHDL parser [Philips] - Tristate support - SystemPerl integration - Multithreaded execution + * Assertions + * Tristate support + * Multithreaded execution + +Configure/Make/Install + * Full MSVC++ compilation (does scons support this?) (4.000?) + * Distribute with flex/bison already expanded? + Flex library not needed. Probably too difficult to be worth it. + * Integrate SystemPerl coverage + (Note in /usr/include there are no upper cased include files.) + Coverage.pm -- Need all functionality, but in C? + Coverage/Item.pm -- Need all functionality, but in C? + Coverage/ItemKey.pm -- Need all functionality, but in C? + sp_preproc -- Some steps in here need to be moved to generated C + src/Sp.cpp -- n/a + src/SpCommon.h -- mostly overlaps verilatedos.h + src/SpCoverage.cpp/h -- All needed + src/SpFunctor.cpp/h -- No longer used + src/SpTraceVcd.cpp/h -- MOVED + src/SpTraceVcdC.cpp/h -- MOVED + src/sp_log.cpp/h -- Not needed + src/systemperl.h -- some stuff may be cut + vcoverage -- Need all functionality, but in C? Testing: - Capture all inputs into global "rerun it" file - Code to make wrapper that sets signals, so can do comparison checks - New random program generator - Better graph viewer with search and zoom - Port and test against opencores.org code + * Move test_c/sp/v/verilated into test_regress format (4.000?) + * Capture all inputs into global "rerun it" file + * Code to make wrapper that sets signals, so can do comparison checks + * New random program generator + * Better graph viewer with search and zoom + * Port and test against opencores.org code Usability: - Better reporting of unopt problems, including what lines of code - Report more errors (all of them?) before exiting [Eugene Weber] + * Detect and pre-remove most UNOPTFLATs (4.000) + * Better reporting of unopt problems, including what lines of code + * Report more errors (all of them?) before exiting [Eugene Weber] + * Auto-create scons config files + * Print version/etc message at runtime. (4.000?) + Include number of lines of code, percent comments, code complexity measurement + <-80chars-------------------------------------------------------------------> + Verilator 3.600 - fast, free, open-sourced. Copyright 2001-2010. + Verilated #### modules, #### instances, ##### sigs, + #### non-comment lines, ##### ops, ### KB model size + * Default the --l2name to remove extra "v" level of hierarchy (flag to make "top") Internal Code: - Eliminate the AstNUser* passed to all visitors; its only needed in V3Width, - and removing it will speed up and simplify all the other code. - V3Graph should be templated container type, taking in Vertex + Edge types + * Eliminate the AstNUser* passed to all visitors; its only needed in V3Width, + and removing it will speed up and simplify all the other code. + * V3Graph should be templated container type, taking in Vertex + Edge types + * Rename V3PreLex etc to match VerilogPerl filenames + * Instead of string, have an VEncodedString/VIdString which contains __DOT__ish + things, to reduce bugs. Also add _20 trailing space to \ encoded names. (4.000) + +Runtime: + * New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?) + * Remove all private internal functions from top level wrapper header, move + to new level (4.000?) + * Completely standalone simulation (4.000) + main() records arguments for $test$plusvars + instantiates top, + does tracing (support $dump?) + calls top->simulateForever() + exits Performance: - Constant propagation + * Latch optimizations + * Constant propagation Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF)) Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #)) - Gated clock and latch conversion to flops. [JeanPaul Vanitegem] + * Gated clock and latch conversion to flops. [JeanPaul Vanitegem] Could propagate the AND into pos/negedges and let domaining optimize. - Negedge reset + * Negedge reset Switch to remove negedges that don't matter Can't remove async resets from control flops (like in syncronizers) - If all references to array have a constant index, blow up into separate signals-per-index - Multithreaded execution - Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings. - Move _last sets and all other combo logic inside master + * If all references to array have a constant index, blow up into separate signals-per-index + * Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings. + * Move _last sets and all other combo logic inside master if() that triggers on all possible sense items - Rewrite and combine V3Life, V3Subst + * Rewrite and combine V3Life, V3Subst If block temp only ever set in one place to constant, propagate it Used in t_mem for array delayed assignments Replace variables if set later in same cfunc branch See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop - Same assignment on both if branches + * Same assignment on both if branches "if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..." Careful though, as b could appear in the statement or multiple times in statement (Could just require exatly two 'b's in statement) - Simplify XOR/XNOR/AND/OR bit selection trees + * Simplify XOR/XNOR/AND/OR bit selection trees Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 ) - Combine variables into wider elements + * Combine variables into wider elements Parallel statements on different bits should become single signal Variables that are always consumed in "parallel" can be joined - Duplicate assignments in gate optimization + * Duplicate assignments in gate optimization Common to have many separate posedge blocks, each with identical reset_r <= rst_in - *If signal is used only once (not counting trace), always gate substitute + * If signal is used only once (not counting trace), always gate substitute Don't merge if any combining would form circ logic (out goes back to in) - Multiple assignments each bit can become single assign with concat + * Multiple assignments each bit can become single assign with concat Make sure a SEL of a CONCAT can get the single bit back. - Usually blocks/values + * Usually blocks/values Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out - Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching - Allow Split of case statements without a $display/$stop - I-cache packing improvements (what/how?) - Data cache organization (order of vars in class) + * Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching + * Allow Split of case statements without a $display/$stop + * I-cache packing improvements (what/how?) + * Data cache organization (order of vars in class) First have clocks, then bools instead of uint32_t's then based on what sense list they come from, all outputs, then all inputs finally have any signals part of a "usually" block, or constant. - Rather then tracking widths, have a MSB...LSB of this expression + * Rather then tracking widths, have a MSB...LSB of this expression (or better, a bitmask of bits relevant in this expression) - Track recirculation and convert into clock-enables - Clock enables should become new clocking domains for speed - If floped(a) & flopped(b) and no other a&b, then instead flop(a&b). - Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal) - - All of the temp vars that get set, exp pre_ vars and never feedback - (not flops) don't need to be stored in the structs, but instead can - be per-invocation, and even better register-colored-like to reuse - the space. This will greatly reduce the data footprint. - - -//********************************************************************** -//* Eventual tristate bus Stuff allowed (old verilator) - - 1) Tristate assignments must be continuous assignments - The RHS of a tristate assignment can be the following - a) a node (tristate or non-tristate) - b) a constant (must be all or no z's) - x'b0, x'bz, x{x'bz}, x{x'b0} -> are allowed - c) a conditional whose possible values are (a) or (b) - - 2) One can lose that fact that a node is a tristate node. This happens - if a tristate node is assigned to a 'standard' node, or is used on - RHS of a conditional. The following infer tristate signals: - a) inout - b) tri - c) assigning to 'Z' (maybe through a conditional) - Note: tristate-ness of an output port determined only by - statements in the module (not the instances it calls) - - 4) Tristate variables can't be multidimensional arrays - 5) Only check tristate contention between modules (not within!) - 6) Only simple compares with 'Z' are allowed (===) - + * Track recirculation and convert into clock-enables + * Clock enables should become new clocking domains for speed + * If floped(a) & flopped(b) and no other a&b, then instead flop(a&b). + * Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal) diff --git a/internals.pod b/internals.pod index df6b2d7de..ee27762d2 100644 --- a/internals.pod +++ b/internals.pod @@ -40,7 +40,134 @@ Modify the later visitor functions to process the new feature as needed. =back -=head1 DEBUG OUTPUT/ TREE FILES +=head1 CODE FLOWS + +=head2 Verilator Flow + +The main flow of Verilator can be followed by reading the Verilator.cpp +process() function: + +First, the files specified on the command line are read. Reading involves +preprocessing, then lexical analysis with Flex and parsing with Bison. +This produces an abstract syntax tree (AST) representation of the design, +which is what is visible in the .tree files described below. + +Cells are then linked, which will read and parse additional files as above. + +Functions, variable and other references are linked to their definitions. + +Parameters are resolved and the design is elaborated. + +Verilator then performs many additional edits and optimizations on the +hierarchical design. This includes coverage, assertions, X elimination, +inlining, constant propagation, and dead code elimination. + +References in the design are then psudo-flattened. Each module's variables +and functions get "Scope" references. A scope reference is an occurrence of +that un-flattened variable in the flattened hierarchy. A module that occurs +only once in the hierarchy will have a single scope and single VarScope for +each variable. A module that occurs twice will have a scope for each +occurrence, and two VarScopes for each variable. This allows optimizations +to proceed across the flattened design, while still preserving the +hierarchy. + +Additional edits and optimizations proceed on the psudo-flat design. These +include module references, function inlining, loop unrolling, variable +lifetime analysis, lookup table creation, always splitting, and logic gate +simplifications (pushing inverters, etc). + +Verilator orders the code. Best case, this results in a single "eval" +function which has all always statements flowing from top to bottom with no +loops. + +Verilator mostly removes the flattening, so that code may be shared between +multiple invocations of the same module. It localizes variables, combines +identical functions, expands macros to C primitives, adds branch prediction +hints, and performs additional constant propagation. + +Verilator finally writes the C++ modules. + +=head2 Verilated Flow + +The evaluation loop outputted by Verilator is designed to allow a single +function to perform evaluation under most situations. + +On the first evaluation, the Verilated code calls initial blocks, and then +"settles" the modules, by evaluating functions (from always statements) +until all signals are stable. + +On other evaluations, the Verilated code detects what input signals have +changes. If any are clocks, it calls the appropriate sequential functions +(from always @ posedge statements). Interspersed with sequential functions +it calls combo functions (from always @*). After this is complete, it +detects any changes due to combo loops or internally generated clocks, and +if one is found must reevaluate the model again. + +For SystemC code, the eval() function is wrapped in a SystemC SC_METHOD, +sensitive to all inputs. (Ideally it would only be sensitive to clocks and +combo inputs, but tracing requires all signals to cause evaluation, and the +performance difference is small.) + +If tracing is enabled, a callback examines all variables in the design for +changes, and writes the trace for each change. To accelerate this process +the evaluation process records a bitmask of variables that might have +changed; if clear, checking those signals for changes may be skipped. + +=head1 VISITOR FUNCTIONS + +=head2 Passing Variables + +There's three ways data is passed between visitor functions. + +1. A visitor-class member variable. This is generally for passing "parent" +information down to children. m_modp is a common example. It's set to +NULL in the constructor, where that node (AstModule visitor) sets it, then +the children are iterated, then it's cleared. Children under an AstModule +will see it set, while nodes elsewhere will see it clear. If there can be +nested items (for example an AstFor under an AstFor) the variable needs to +be save-set-restored in the AstFor visitor, otherwise exiting the lower for +will loose the upper for's setting. + +2. User() attributes. Each node has 5 ->user() number or ->userp() pointer +utility values (a common technique lifted from graph traversal packages). +A visitor first clears the one it wants to use by calling +AstNode::user#ClearTree(), then it can mark any node's user() with whatever +data it wants. Readers just call nodep->user(), but may need to cast +appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the +top of each visitor are comments describing how the user() stuff applies to +that visitor class. For example: + + // NODE STATE + // Cleared entire netlist + // AstModule::user1p() // bool. True to inline this module + +This says that at the AstNetlist user1ClearTree() is called. Each +AstModule's is user1() is used to indicate if we're going to inline it. + +These comments are important to make sure a user#() on a given AstNode type +is never being used for two different purposes. + +Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's +ok to call fairly often. For example, it's commonly called on every +module. + +3. Parameters can be passed between the visitors in close to the "normal" +function caller to callee way. This is the second "vup" parameter that is +ignored on most of the visitor functions. V3Width does this, but it proved +more messy than the above and is deprecated. (V3Width was nearly the first +module written. Someday this scheme may be removed, as it slows the +program down to have to pass vup everywhere.) + +=head1 TESTING + +To write a test see notes in the forum and in the verilator.txt manual. + +Note you can run the regression tests in parallel; see the +test_regress/driver.pl script -j flag. + +=head1 DEBUGGING + +=head2 --debug When you run with --debug there are two primary output file types placed into the obj_dir, .tree and .dot files. @@ -94,59 +221,7 @@ variable is an output. =back -=head1 TESTING - -To write a test see notes in the forum and in the verilator.txt manual. - -Note you can run the regression tests in parallel; see the -test_regress/driver.pl script -j flag. - -=head1 VISITOR FUNCTIONS - -=head2 Passing Variables - -There's three ways data is passed between visitor functions. - -1. A visitor-class member variable. This is generally for passing "parent" -information down to children. m_modp is a common example. It's set to -NULL in the constructor, where that node (AstModule visitor) sets it, then -the children are iterated, then it's cleared. Children under an AstModule -will see it set, while nodes elsewhere will see it clear. If there can be -nested items (for example an AstFor under an AstFor) the variable needs to -be save-set-restored in the AstFor visitor, otherwise exiting the lower for -will loose the upper for's setting. - -2. User() attributes. Each node has 5 ->user() number or ->userp() pointer -utility values (a common technique lifted from graph traversal packages). -A visitor first clears the one it wants to use by calling -AstNode::user#ClearTree(), then it can mark any node's user() with whatever -data it wants. Readers just call nodep->user(), but may need to cast -appropriately, so you'll often see nodep->userp()->castSOMETYPE(). At the -top of each visitor are comments describing how the user() stuff applies to -that visitor class. For example: - - // NODE STATE - // Cleared entire netlist - // AstModule::user1p() // bool. True to inline this module - -This says that at the AstNetlist user1ClearTree() is called. Each -AstModule's is user1() is used to indicate if we're going to inline it. - -These comments are important to make sure a user#() on a given AstNode type -is never being used for two different purposes. - -Note that calling user#ClearTree is fast, it doesn't walk the tree, so it's -ok to call fairly often. For example, it's commonly called on every -module. - -3. Parameters can be passed between the visitors in close to the "normal" -function caller to callee way. This is the second "vup" parameter that is -ignored on most of the visitor functions. V3Width does this, but it proved -more messy than the above and is deprecated. (V3Width was nearly the first -module written. Someday this scheme may be removed, as it slows the -program down to have to pass vup everywhere.) - -=head1 DEBUGGING WITH GDB +=head2 Debugging with GDB The test_regress/driver.pl script accepts --debug --gdb to start Verilator under gdb. You can also use --debug --gdbbt to just backtrace and then