Commentary: Rewrite TODO

This commit is contained in:
Wilson Snyder 2018-07-24 07:37:39 -04:00
parent aabb7394c3
commit c011842edf

212
TODO
View File

@ -5,119 +5,115 @@
// Lesser General Public License Version 3 or the Perl Artistic License
// Version 2.0.
* Language support:
** Fix ordering of each bit separately in a signal (mips)
assign b[3:0] = b[7:4]; assign b[7:4] = in;
** Support UDP gate primitives/ cell libraries
(have code for combos - problem is sequential udps)
** Function to eval combo logic after /*verilator public*/ functions [gwaters]
** Support generated clocks (correctness)
** Recursive functions
** Verilog configuration files
** Expression coverage (see notes)
** Better tristate support
** UVM
Language support:
* Fix ordering of each bit separately in a signal (mips)
assign b[3:0] = b[7:4]; assign b[7:4] = in;
* Support UDP gate primitives/ cell libraries
(have code for combos - problem is sequential udps)
* Function to eval combo logic after /*verilator public*/ functions [gwaters]
* Support generated clocks (correctness)
* Real numbers
* Recursive functions
* Verilog configuration files
* Structs/unions (have starting point)
* DPI to define C/C++ calls from Verilog
* Expression coverage (see notes)
* Better tristate support
* Long-term Features
** Assertions
** Tristate support
Long-term Features
* Assertions
* Tristate support
* Multithreaded execution
* Configure/Make/Install
** Distribute with flex/bison already expanded?
Flex library not needed. Probably too difficult to be worth it.
Configure/Make/Install
* Distribute with flex/bison already expanded?
Flex library not needed. Probably too difficult to be worth it.
* Testing:
** Capture all inputs into global "rerun it" file
** Code to make wrapper that sets signals, so can do comparison checks
** New random program generator
** Better graph viewer with search and zoom
** Port and test against opencores.org code
** // verilator debug in code so can see only tree affecting those nodes
Testing:
* Capture all inputs into global "rerun it" file
* Code to make wrapper that sets signals, so can do comparison checks
* New random program generator
* Better graph viewer with search and zoom
* Port and test against opencores.org code
* // verilator debug in code so can see only tree affecting those nodes
* Usability:
** Detect and pre-remove most UNOPTFLATs
** Better reporting of unopt problems, including what lines of code
** Report more errors (all of them?) before exiting [Eugene Weber]
** Auto-create scons config files
** Print version/etc message at runtime. (4.000?)
Include number of lines of code, percent comments, code complexity measurement
<-80chars------------------------------------------------------------------->
Verilator 3.600 - The fast free open-sourced simulator. Copyright 2001-2013.
Verilated #### modules, #### instances, ##### sigs,
#### non-comment lines, ##### ops, ### KB model size
Usability:
* Detect and pre-remove most UNOPTFLATs (4.000)
* Better reporting of unopt problems, including what lines of code
* Report more errors (all of them?) before exiting [Eugene Weber]
* Auto-create scons config files
* Print version/etc message at runtime. (4.000?)
Include number of lines of code, percent comments, code complexity measurement
<-80chars------------------------------------------------------------------->
Verilator 3.600 - The fast free open-sourced simulator. Copyright 2001-2013.
Verilated #### modules, #### instances, ##### sigs,
#### non-comment lines, ##### ops, ### KB model size
* Lint:
** CDCRSTLOGIC should allow filtering with paths
"waive CDCRSTLOGIC --from a.b.sig --to a.c.sig --via OR"
Lint:
* CDCRSTLOGIC should allow filtering with paths
"waive CDCRSTLOGIC --from a.b.sig --to a.c.sig --via OR"
* Internal Code:
** A Visitor class that understands how to traverse data types
** V3Graph should be templated container type, taking in Vertex + Edge types
** Instead of string, have an VEncodedString/VIdString which contains __DOT__ish
things, to reduce bugs. Also add _20 trailing space to \ encoded names.
Internal Code:
* A Visitor class that understands how to traverse data types
* V3Graph should be templated container type, taking in Vertex + Edge types
* Instead of string, have an VEncodedString/VIdString which contains __DOT__ish
things, to reduce bugs. Also add _20 trailing space to \ encoded names. (4.000)
* Runtime:
** New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?)
** Remove all private internal functions from top level wrapper header, move
to new level
** Completely standalone simulation
main() records arguments for $test$plusvars
instantiates top,
does tracing (support $dump?)
calls top->simulateForever()
exits
Runtime:
* New evalulation loop ~/src/verilator/notes/event_loop.txt (4.000?)
* Remove all private internal functions from top level wrapper header, move
to new level (4.000?)
* Completely standalone simulation (4.000)
main() records arguments for $test$plusvars
instantiates top,
does tracing (support $dump?)
calls top->simulateForever()
exits
Performance:
* Latch optimizations
* Constant propagation
Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF))
Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #))
* Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
Could propagate the AND into pos/negedges and let domaining optimize.
* Negedge reset
Switch to remove negedges that don't matter
Can't remove async resets from control flops (like in syncronizers)
* If all references to array have a constant index, blow up into separate signals-per-index
* Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
* Move _last sets and all other combo logic inside master
if() that triggers on all possible sense items
* Rewrite and combine V3Life, V3Subst
If block temp only ever set in one place to constant, propagate it
Used in t_mem for array delayed assignments
Replace variables if set later in same cfunc branch
See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop
* Same assignment on both if branches
"if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..."
Careful though, as b could appear in the statement or multiple times in statement
(Could just require exatly two 'b's in statement)
* Simplify XOR/XNOR/AND/OR bit selection trees
Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 )
* Combine variables into wider elements
Parallel statements on different bits should become single signal
Variables that are always consumed in "parallel" can be joined
* Duplicate assignments in gate optimization
Common to have many separate posedge blocks, each with identical
reset_r <= rst_in
* If signal is used only once (not counting trace), always gate substitute
Don't merge if any combining would form circ logic (out goes back to in)
* Multiple assignments each bit can become single assign with concat
Make sure a SEL of a CONCAT can get the single bit back.
* Usually blocks/values
Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out
* Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
* I-cache packing improvements (what/how?)
* Data cache organization (order of vars in class)
First have clocks,
then bools instead of uint32_t's
then based on what sense list they come from, all outputs, then all inputs
finally have any signals part of a "usually" block, or constant.
* Rather then tracking widths, have a MSB...LSB of this expression
(or better, a bitmask of bits relevant in this expression)
* Track recirculation and convert into clock-enables
* Clock enables should become new clocking domains for speed
* If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
* Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)
* Performance:
** Latch optimizations
** Constant propagation
Extra cleaning AND: 1 & ((VARREF >> 1) | ((&VARREF >> 1) & VARREF))
Extra shift (perhaps due to clean): if (1 & CAST (VARREF >> #))
** Gated clock and latch conversion to flops. [JeanPaul Vanitegem]
Could propagate the AND into pos/negedges and let domaining optimize.
** Negedge reset
Switch to remove negedges that don't matter
Can't remove async resets from control flops (like in syncronizers)
** If all references to array have a constant index, blow up into separate signals-per-index
** Bit-multiply for faster bit swapping and a=b[1,3,2] random bit reorderings.
** Move _last sets and all other combo logic inside master
if() that triggers on all possible sense items
** Rewrite and combine V3Life, V3Subst
If block temp only ever set in one place to constant, propagate it
Used in t_mem for array delayed assignments
Replace variables if set later in same cfunc branch
See for example duplicate sets of _narrow in cycle 90/91 of t_select_plusloop
** Same assignment on both if branches
"if (a) { ... b=2; } else { ... b=2;}" -> "b=2; if ..."
Careful though, as b could appear in the statement or multiple times in statement
(Could just require exatly two 'b's in statement)
** Simplify XOR/XNOR/AND/OR bit selection trees
Foo = A[1] ^ A[2] ^ A[3] etc are better as ^ ( A & 32'b...1110 )
** Combine variables into wider elements
Parallel statements on different bits should become single signal
Variables that are always consumed in "parallel" can be joined
** Duplicate assignments in gate optimization
Common to have many separate posedge blocks, each with identical
reset_r <= rst_in
** If signal is used only once (not counting trace), always gate substitute
Don't merge if any combining would form circ logic (out goes back to in)
** Multiple assignments each bit can become single assign with concat
Make sure a SEL of a CONCAT can get the single bit back.
** Usually blocks/values
Enable only after certain time, so VL_TIME_I(32) > 0x1e gets eliminated out
** Better ordering of a<=b, b<=c, put all refs to 'b' next to each other to optimize caching
** I-cache packing improvements (what/how?)
** Data cache organization (order of vars in class)
First have clocks,
then bools instead of uint32_t's
then based on what sense list they come from, all outputs, then all inputs
finally have any signals part of a "usually" block, or constant.
** Rather then tracking widths, have a MSB...LSB of this expression
(or better, a bitmask of bits relevant in this expression)
** Track recirculation and convert into clock-enables
** Clock enables should become new clocking domains for speed
** If floped(a) & flopped(b) and no other a&b, then instead flop(a&b).
** Sort by output bitselects so can combine more assignments (see DDP example dx_dm signal)