The build is now by default configured to link performance critical
libraries (libgcc, libstdc++, libtcmalloc) statically. This improves
Verilation speed by between 4.5-7% based on my measurements as it
eliminates approx 20% of the mispredicted branches from the execution.
With partial static linking, the size of the .text section in
verilator_bin is increased by about 14%, and the binary is itself only
about 800KB bigger on disk, so hopefully this is not a big issue in
exchange for the faster compilation speed. A configure option
"--disable-partial-static" is provided to restore the old behaviour of
linking everything dynamically.
Note: This patch also changes to use libtcmalloc_minimal, which is all
we really need and itself has fewer dependencies.
Replace the virtual type() method on AstNode with a non-virtual, inlined
accessor to a const member variable m_type. This means that in order to be
able to use this for type testing, it needs to be initialized based on the
final type of the node. This is achieved by passing the relevant AstType
value back through the constructor call chain. Most of the boilerplate
involved is auto generated by first feeding V3AstNodes.h through astgen to
get V3AstNodes__gen.h, which is then included in V3Ast.h. No client code
needs to be aware and there is no functional change intended.
Eliminating the virtual function call to fetch the node type identifier
results in measured compilation speed improvement of 5-10% as it
eliminates up to 20% of all mispredicted branches from the execution.
As Verilator continuously allocates and releases small objects (e.g.:
AstNode, V3GraphVertex, V3GraphEdge), it spends a significant amount of
time in malloc/free and friends. This patch adds the --enable-tcmalloc
configure option to link Verilator against the high performance malloc
implementation library libtcmalloc. The default is to use libtcmalloc if
available on the system. Note that there are no source code change, we
are simply replacing the standard library memory allocation functions.
Measured major compilation speed improvement of 27% when running
Verilator with -O3 on a large design.
dynamic_cast can have large run-time cost, so here we implement type
tests for AstNode instances by checking the unique type() property, which
in turn is a constant generated by astgen. For leaf types in the AstNode
type hierarchy, this is a simple equality check. To handle intermediate
types, we generate the type ids of leaf types in a pre-order traversal of
the type hierarchy. This yields contiguous ranges of ids for sub-type
trees, which means we can check membership of a non-leaf type via 2
comparisons against a low and high id. This single patch makes Verilator
itself 6-13% faster (depending on which optimizations are enabled) on a
large design of over 250k lines of Verilog.