verilator/src
Geza Lore dc5c259069
Improve tracing performance. (#2257)
* Improve tracing performance.

Various tactics used to improve performance of both VCD and FST tracing:
- Both: Change tracing functions to templates to take variable widths as
  template parameters. For VCD, subsequently specialize these to the
  values used by Verilator. This avoids redundant instructions and hard
  to predict branches.
- Both: Check for value changes via direct pointer access into the
  previous signal value buffer. This eliminates a lot of simple pointer
  arithmetic instructions form the tracing code.
- Both: Verilator provides clean input, no need to mask out used bits.
- VCD: pre-compute identifier codes and use memory copy instead of
  re-computing them every time a code is emitted. This saves a lot of
  instructions and hard to predict branches. The added D-cache misses
  are cheaper than the removed branches/instructions.
- VCD: re-write the routines emitting the changes to be more efficient.
- FST: Use previous signal value buffer the same way as the VCD tracing
  code, and only call the FST API when a change is detected.

Performance as measured on SweRV EH1, with the pre-canned CoreMark
benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz,
and IO to ramdisk:

            +--------------+---------------+----------------------+
            | VCD          | FST           | FST separate thread  |
            | (--trace)    | (--trace-fst) | (--trace-fst-thread) |
------------+-----------------------------------------------------+
Before      |  30.2 s      | 121.1 s       |  69.8 s              |
============+==============+===============+======================+
After       |  24.7 s      |  45.7 s       |  32.4 s              |
------------+--------------+---------------+----------------------+
Speedup     |    22 %      |   256 %       |   215 %              |
------------+--------------+---------------+----------------------+
Rel. to VCD |     1 x      |  1.85 x       |  1.31 x              |
------------+--------------+---------------+----------------------+

In addition, FST trace size for the above reduced by 48%.
2020-04-14 00:13:10 +01:00
..
.gdbinit
.gitignore
astgen
bisonpre
config_build.h.in
config_rev.pl
cppcheck_filtered
flexfix
Makefile_obj.in
Makefile.in
mkinstalldirs
pod2latexfix
V3Active.cpp
V3Active.h
V3ActiveTop.cpp
V3ActiveTop.h
V3Assert.cpp
V3Assert.h
V3AssertPre.cpp
V3AssertPre.h
V3Ast.cpp
V3Ast.h
V3AstConstOnly.h
V3AstNodes.cpp
V3AstNodes.h
V3Begin.cpp
V3Begin.h
V3Branch.cpp
V3Branch.h
V3Broken.cpp
V3Broken.h
V3Case.cpp
V3Case.h
V3Cast.cpp
V3Cast.h
V3CCtors.cpp
V3CCtors.h
V3Cdc.cpp
V3Cdc.h
V3Changed.cpp
V3Changed.h
V3Class.cpp
V3Class.h
V3Clean.cpp
V3Clean.h
V3Clock.cpp
V3Clock.h
V3Combine.cpp
V3Combine.h
V3Config.cpp
V3Config.h
V3Const.cpp
V3Const.h
V3Coverage.cpp
V3Coverage.h
V3CoverageJoin.cpp
V3CoverageJoin.h
V3CUse.cpp
V3CUse.h
V3Dead.cpp
V3Dead.h
V3Delayed.cpp
V3Delayed.h
V3Depth.cpp
V3Depth.h
V3DepthBlock.cpp
V3DepthBlock.h
V3Descope.cpp
V3Descope.h
V3EmitC.cpp
V3EmitC.h
V3EmitCBase.h
V3EmitCInlines.cpp
V3EmitCMake.cpp
V3EmitCMake.h
V3EmitCSyms.cpp
V3EmitMk.cpp
V3EmitMk.h
V3EmitV.cpp
V3EmitV.h
V3EmitXml.cpp
V3EmitXml.h
V3Error.cpp
V3Error.h
V3Expand.cpp
V3Expand.h
V3File.cpp
V3File.h
V3FileLine.cpp
V3FileLine.h
V3Gate.cpp
V3Gate.h
V3GenClk.cpp
V3GenClk.h
V3Global.cpp
V3Global.h Internals: cppcheck 1.90 fixes. No functional change intended. 2020-04-05 18:57:47 -04:00
V3Graph.cpp
V3Graph.h
V3GraphAcyc.cpp
V3GraphAlg.cpp
V3GraphAlg.h
V3GraphDfa.cpp
V3GraphDfa.h
V3GraphPathChecker.cpp
V3GraphPathChecker.h
V3GraphStream.h
V3GraphTest.cpp
V3Hashed.cpp
V3Hashed.h
V3Inline.cpp
V3Inline.h
V3Inst.cpp
V3Inst.h
V3InstrCount.cpp
V3InstrCount.h
V3LangCode.h
V3LanguageWords.h
V3Life.cpp
V3Life.h
V3LifePost.cpp
V3LifePost.h
V3LinkCells.cpp
V3LinkCells.h
V3LinkDot.cpp
V3LinkDot.h
V3LinkJump.cpp
V3LinkJump.h
V3LinkLevel.cpp
V3LinkLevel.h
V3LinkLValue.cpp
V3LinkLValue.h
V3LinkParse.cpp
V3LinkParse.h
V3LinkResolve.cpp
V3LinkResolve.h
V3List.h Cleanup misc clang-tidy warnings. No functional change intended 2020-04-03 22:31:54 -04:00
V3Localize.cpp
V3Localize.h
V3Name.cpp
V3Name.h
V3Number_test.cpp
V3Number.cpp
V3Number.h
V3Options.cpp
V3Options.h
V3Order.cpp
V3Order.h
V3OrderGraph.h
V3Os.cpp
V3Os.h
V3Param.cpp
V3Param.h
V3Parse.h
V3ParseGrammar.cpp
V3ParseImp.cpp
V3ParseImp.h
V3ParseLex.cpp
V3ParseSym.h
V3Partition.cpp
V3Partition.h
V3PartitionGraph.h
V3PreLex.h
V3PreLex.l
V3Premit.cpp
V3Premit.h
V3PreProc.cpp
V3PreProc.h
V3PreShell.cpp
V3PreShell.h
V3ProtectLib.cpp
V3ProtectLib.h
V3Reloop.cpp
V3Reloop.h
V3Scope.cpp
V3Scope.h
V3Scoreboard.cpp
V3Scoreboard.h
V3SenTree.h
V3Simulate.h
V3Slice.cpp
V3Slice.h
V3Split.cpp
V3Split.h
V3SplitAs.cpp
V3SplitAs.h
V3SplitVar.cpp
V3SplitVar.h
V3Stats.cpp
V3Stats.h
V3StatsReport.cpp
V3String.cpp
V3String.h
V3Subst.cpp
V3Subst.h
V3SymTable.h
V3Table.cpp
V3Table.h
V3Task.cpp
V3Task.h
V3Trace.cpp
V3Trace.h
V3TraceDecl.cpp
V3TraceDecl.h
V3Tristate.cpp
V3Tristate.h
V3TSP.cpp
V3TSP.h
V3Undriven.cpp
V3Undriven.h
V3Unknown.cpp
V3Unknown.h
V3Unroll.cpp
V3Unroll.h
V3Width.cpp
V3Width.h
V3WidthCommit.h
V3WidthSel.cpp
Verilator.cpp
verilog.l
verilog.y
VlcBucket.h
VlcMain.cpp
VlcOptions.h
vlcovgen
VlcPoint.h
VlcSource.h
VlcTest.h
VlcTop.cpp
VlcTop.h