diff --git a/TODO b/TODO index 5e696713b..99cf267c7 100755 --- a/TODO +++ b/TODO @@ -5,119 +5,115 @@ // Lesser General Public License Version 3 or the Perl Artistic License // Version 2.0. +* Language support: +** Fix ordering of each bit separately in a signal (mips) + assign b[3:0] = b[7:4]; assign b[7:4] = in; +** Support UDP gate primitives/ cell libraries + (have code for combos - problem is sequential udps) +** Function to eval combo logic after /*verilator public*/ functions [gwaters] +** Support generated clocks (correctness) +** Recursive functions +** Verilog configuration files +** Expression coverage (see notes) +** Better tristate support +** UVM -Language support: - * Fix ordering of each bit separately in a signal (mips) - assign b[3:0] = b[7:4]; assign b[7:4] = in; - * Support UDP gate primitives/ cell libraries - (have code for combos - problem is sequential udps) - * Function to eval combo logic after /*verilator public*/ functions [gwaters] - * Support generated clocks (correctness) - * Real numbers - * Recursive functions - * Verilog configuration files - * Structs/unions (have starting point) - * DPI to define C/C++ calls from Verilog - * Expression coverage (see notes) - * Better tristate support +* Long-term Features +** Assertions +** Tristate support -Long-term Features - * Assertions - * Tristate support - * Multithreaded execution +* Configure/Make/Install +** Distribute with flex/bison already expanded? + Flex library not needed. Probably too difficult to be worth it. -Configure/Make/Install - * Distribute with flex/bison already expanded? - Flex library not needed. Probably too difficult to be worth it. +* Testing: +** Capture all inputs into global "rerun it" file +** Code to make wrapper that sets signals, so can do comparison checks +** New random program generator +** Better graph viewer with search and zoom +** Port and test against opencores.org code +** // verilator debug in code so can see only tree affecting those nodes -Testing: - * Capture all inputs into global "rerun it" file - * Code to make wrapper that sets signals, so can do comparison checks - * New random program generator - * Better graph viewer with search and zoom - * Port and test against opencores.org code - * // verilator debug in code so can see only tree affecting those nodes +* Usability: +** Detect and pre-remove most UNOPTFLATs +** Better reporting of unopt problems, including what lines of code +** Report more errors (all of them?) before exiting [Eugene Weber] +** Auto-create scons config files +** Print version/etc message at runtime. (4.000?) + Include number of lines of code, percent comments, code complexity measurement + <-80chars-------------------------------------------------------------------> + Verilator 3.600 - The fast free open-sourced simulator. Copyright 2001-2013. + Verilated #### modules, #### instances, ##### sigs, + #### non-comment lines, ##### ops, ### KB model size -Usability: - * Detect and pre-remove most UNOPTFLATs (4.000) - * Better reporting of unopt problems, including what lines of code - * Report more errors (all of them?) before exiting [Eugene Weber] - * Auto-create scons config files - * Print version/etc message at runtime. (4.000?) - Include number of lines of code, percent comments, code complexity measurement - <-80chars-------------------------------------------------------------------> - Verilator 3.600 - The fast free open-sourced simulator. Copyright 2001-2013. - Verilated #### modules, #### instances, ##### sigs, - #### non-comment lines, ##### ops, ### KB model size +* Lint: +** CDCRSTLOGIC should allow filtering with paths + "waive CDCRSTLOGIC --from a.b.sig --to a.c.sig --via OR" -Lint: - * CDCRSTLOGIC should allow filtering with paths - "waive CDCRSTLOGIC --from a.b.sig --to a.c.sig --via OR" +* Internal Code: +** A Visitor class that understands how to traverse data types +** V3Graph should be templated container type, taking in Vertex + Edge types +** Instead of string, have an VEncodedString/VIdString which contains __DOT__ish + things, to reduce bugs. Also add _20 trailing space to \ encoded names. -Internal Code: - * A Visitor class that understands how to traverse data types - * V3Graph should be templated container type, taking in Vertex + Edge types - * Instead of string, have an VEncodedString/VIdString which contains __DOT__ish - things, to reduce bugs. Also add _20 trailing space to \ encoded names. (4.000) +* Runtime: +** New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?) +** Remove all private internal functions from top level wrapper header, move + to new level +** Completely standalone simulation + main() records arguments for $test$plusvars + instantiates top, + does tracing (support $dump?) + calls top->simulateForever() + exits -Runtime: - * New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?) - * Remove all private internal functions from top level wrapper header, move - to new level (4.000?) - * Completely standalone simulation (4.000) - main() records arguments for $test$plusvars - instantiates top, - does tracing (support $dump?) - calls top->simulateForever() - exits - -Performance: - * Latch optimizations - * Constant propagation - Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF)) - Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #)) - * Gated clock and latch conversion to flops. [JeanPaul Vanitegem] - Could propagate the AND into pos/negedges and let domaining optimize. - * Negedge reset - Switch to remove negedges that don't matter - Can't remove async resets from control flops (like in syncronizers) - * If all references to array have a constant index, blow up into separate signals-per-index - * Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings. - * Move _last sets and all other combo logic inside master - if() that triggers on all possible sense items - * Rewrite and combine V3Life, V3Subst - If block temp only ever set in one place to constant, propagate it - Used in t_mem for array delayed assignments - Replace variables if set later in same cfunc branch - See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop - * Same assignment on both if branches - "if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..." - Careful though, as b could appear in the statement or multiple times in statement - (Could just require exatly two 'b's in statement) - * Simplify XOR/XNOR/AND/OR bit selection trees - Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 ) - * Combine variables into wider elements - Parallel statements on different bits should become single signal - Variables that are always consumed in "parallel" can be joined - * Duplicate assignments in gate optimization - Common to have many separate posedge blocks, each with identical - reset_r <= rst_in - * If signal is used only once (not counting trace), always gate substitute - Don't merge if any combining would form circ logic (out goes back to in) - * Multiple assignments each bit can become single assign with concat - Make sure a SEL of a CONCAT can get the single bit back. - * Usually blocks/values - Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out - * Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching - * I-cache packing improvements (what/how?) - * Data cache organization (order of vars in class) - First have clocks, - then bools instead of uint32_t's - then based on what sense list they come from, all outputs, then all inputs - finally have any signals part of a "usually" block, or constant. - * Rather then tracking widths, have a MSB...LSB of this expression - (or better, a bitmask of bits relevant in this expression) - * Track recirculation and convert into clock-enables - * Clock enables should become new clocking domains for speed - * If floped(a) & flopped(b) and no other a&b, then instead flop(a&b). - * Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal) +* Performance: +** Latch optimizations +** Constant propagation + Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF)) + Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #)) +** Gated clock and latch conversion to flops. [JeanPaul Vanitegem] + Could propagate the AND into pos/negedges and let domaining optimize. +** Negedge reset + Switch to remove negedges that don't matter + Can't remove async resets from control flops (like in syncronizers) +** If all references to array have a constant index, blow up into separate signals-per-index +** Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings. +** Move _last sets and all other combo logic inside master + if() that triggers on all possible sense items +** Rewrite and combine V3Life, V3Subst + If block temp only ever set in one place to constant, propagate it + Used in t_mem for array delayed assignments + Replace variables if set later in same cfunc branch + See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop +** Same assignment on both if branches + "if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..." + Careful though, as b could appear in the statement or multiple times in statement + (Could just require exatly two 'b's in statement) +** Simplify XOR/XNOR/AND/OR bit selection trees + Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 ) +** Combine variables into wider elements + Parallel statements on different bits should become single signal + Variables that are always consumed in "parallel" can be joined +** Duplicate assignments in gate optimization + Common to have many separate posedge blocks, each with identical + reset_r <= rst_in +** If signal is used only once (not counting trace), always gate substitute + Don't merge if any combining would form circ logic (out goes back to in) +** Multiple assignments each bit can become single assign with concat + Make sure a SEL of a CONCAT can get the single bit back. +** Usually blocks/values + Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out +** Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching +** I-cache packing improvements (what/how?) +** Data cache organization (order of vars in class) + First have clocks, + then bools instead of uint32_t's + then based on what sense list they come from, all outputs, then all inputs + finally have any signals part of a "usually" block, or constant. +** Rather then tracking widths, have a MSB...LSB of this expression + (or better, a bitmask of bits relevant in this expression) +** Track recirculation and convert into clock-enables +** Clock enables should become new clocking domains for speed +** If floped(a) & flopped(b) and no other a&b, then instead flop(a&b). +** Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)