Commit Graph

31 Commits

Author SHA1 Message Date
Wilson Snyder
e02f97854c Deprecate 'vluint64_t' and similar types (#3255). 2022-03-27 15:27:40 -04:00
Wilson Snyder
321880f5a6 Add trace dumpvars() call for selective runtime tracing (#3322). 2022-03-05 15:44:32 -05:00
Wilson Snyder
4cd56b1fb9 Use C++11 standard types for MacOS portability (#3254) (#3257). 2022-01-01 16:04:20 -05:00
Wilson Snyder
ca42be982c Copyright year update. 2022-01-01 08:26:40 -05:00
Wilson Snyder
7526151670 Fix bad ending address on $readmem (#3205). 2021-12-21 19:55:04 -05:00
Geza Lore
ff425369ac
Reduce .rodata footprint of trace initialization (#3250)
Trace initialization (tracep->decl* functions) used to explicitly pass
the complete hierarchical names of signals as string constants. This
contains a lot of redundancy (path prefixes), does not scale well with
large designs and resulted in .rodata sections (the string constants) in
ELF executables being extremely large.

This patch changes the API of trace initialization that allows pushing
and popping name prefixes as we walk the hierarchy tree, which are
prepended to declared signal names at run-time during trace
initialization. This in turn allows us to emit repeat path/name
components only once, effectively removing all duplicate path prefixes.
On SweRV EH1 this reduces the .rodata section in a --trace build by 94%.

Additionally, trace declarations are now emitted in lexical order by
hierarchical signal names, and the top level trace initialization
function respects --output-split-ctrace.
2021-12-19 15:15:07 +00:00
Wilson Snyder
90102d9867 Commentary 2021-10-25 21:02:02 -04:00
Wilson Snyder
4739956cfe Internals: Add missing const. No functional change. 2021-10-05 21:20:22 -04:00
Wilson Snyder
ab13a2ebdc Internals: Use C++11 const and initializers. No functional change intended. 2021-07-24 08:36:11 -04:00
Wilson Snyder
ca01d6f18d Internals: Add some std::'s. No functional change intended. 2021-03-26 21:23:18 -04:00
Wilson Snyder
a1ab295b74 Commentary: Cleanup all include/* header comments. 2021-03-20 17:46:00 -04:00
Wilson Snyder
9483ebefae Internal code coverage cleanups. 2021-03-07 21:05:15 -05:00
Wilson Snyder
2cad22a22a
Add simulation context (VerilatedContext) (#2660). (#2813)
**   Add simulation context (VerilatedContext) to allow multiple fully independent
      models to be in the same process.  Please see the updated examples.
**   Add context->time() and context->timeInc() API calls, to set simulation time.
      These now are recommended in place of the legacy sc_time_stamp().
2021-03-07 11:01:54 -05:00
Wilson Snyder
caa9c99837 Commentary 2021-03-07 08:28:13 -05:00
Wilson Snyder
47dcbd4b8a Internal: Remove deprecated/insecure functions. No functional change intended. 2021-03-06 10:34:03 -05:00
Wilson Snyder
bd602d0e2d Copyright year update 2021-01-01 10:29:54 -05:00
Wilson Snyder
7d05be802d Misc internal coverage hole and related bug fixes 2020-12-09 19:18:12 -05:00
Wilson Snyder
212e8fb14b Internals: Cleanup some inlines, use constexpr. No functional change intended. 2020-12-01 18:49:03 -05:00
Wilson Snyder
44eb362a18 clang-tidy cleanups. No functional change intended. 2020-11-10 21:40:14 -05:00
Wilson Snyder
72d2cff0a1 C++11: Use member declaration initalizations. No functional change intended. 2020-08-16 11:44:06 -04:00
Wilson Snyder
c0127599df C++11: Use nullptr. No functional change. 2020-08-16 11:44:05 -04:00
Geza Lore
fac89c5d62
Close trace on vl_fatal/vl_finish (#2414)
This is required to get the last bit of FST trace and close the FST file
properly on $stop or assertion failure.
2020-06-12 07:15:42 +01:00
Wilson Snyder
6ce878cb0d Fix some clang-tidy warnings 2020-06-01 23:16:17 -04:00
Wilson Snyder
6a882f9dc6 Internal code coverage improvements. No functional change intended. 2020-05-23 10:34:58 -04:00
Wilson Snyder
c4f31d3bb6 Tracing: Remove dead code. No functional change intended. 2020-05-17 09:52:03 -04:00
Geza Lore
900c023bb5 Refactor trace implementation to allow experimentation
The main goal of this patch is to enable splitting the full and
incremental tracing functions into multiple functions, which can then be
run in parallel at a later stage. It also simplifies further
experimentation as all of the interesting trace code construction now
happens in V3Trace. No functional change is intended by this patch, but
there are some implementation changes in the generated code.

Highlights:
- Pass symbol table directly to trace callbacks for simplicity.
- A new traceRegister function is generated which adds each trace
function as an individual callback, which means we can have multiple
callbacks for each trace function type.
- A new traceCleanup function is generated which clears the activity
flags, as the trace callbacks might be implemented as multiple functions.
- Re-worked sub-function handling so there is no separate sub-function
for each trace activity class. Sub-functions are generate when required
by splitting.
- traceFull/traceChg are now created in V3Trace rather than V3TraceDecl,
this requires carrying the trace value tree in TraceDecl until it
reaches V3Trace where the TraceInc nodes are created (previously a
TraceInc was also created in V3TraceDecl which carries the value).
2020-05-15 18:34:29 +01:00
Geza Lore
aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Geza Lore
dd967f7769 Improve trace buffer memory utilization and performance.
Convert trace buffer to 32-bit entries, rather than a union containing a
pointer type. Also tweaked trace entry layouts for a bit more
performance. This gains another 10% on SweRV EH1 CoreMark.
2020-04-27 19:00:17 +01:00
Geza Lore
b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Geza Lore
39d903375b
Factor out trace implementation common to all formats. (#2268)
This patch de-duplicates common functionality between the VCD and FST
trace implementation. It also enables adding new trace formats more
easily and consistently.

No functional nor performance change intended.
2020-04-19 23:57:36 +01:00