Commit Graph

3487 Commits

Author SHA1 Message Date
Tim Snyder
a57262d6e7
Fix use /usr/bin/env perl in lieu of /usr/bin/perl (#2306)
Enables scripts to work where perl is not installed at /usr/bin/perl
2020-05-04 18:42:15 -04:00
Geza Lore
fe708f045a Fix Travis oddity 2020-05-04 00:21:07 +01:00
Geza Lore
8afcd67a1f Fix FST tracing of little endian vectors 2020-05-03 22:39:45 +01:00
Wilson Snyder
16e258eb4c Commentary: reformat changes, relocate announcements 2020-05-03 16:10:02 -04:00
Wilson Snyder
d6fbbddac9 devel release 2020-05-03 11:18:53 -04:00
Wilson Snyder
9dc65df982 Version bump 2020-05-03 11:14:37 -04:00
Wilson Snyder
8f64e4a76f Support $root, #2150. 2020-05-02 08:29:20 -04:00
John Demme
6e9008fb5a
Fix VerilatedVarProps::totalSize missing the first unpacked dim (#2296) 2020-05-01 07:42:29 -04:00
Wilson Snyder
5ded80cf79 Fix MacOs Homebrew by removing default LIBS, #2298. 2020-04-30 19:53:21 -04:00
Wilson Snyder
a6deee2083 Fix clock enables with bit-extends, #2299. 2020-04-30 19:22:58 -04:00
Wilson Snyder
e1f45dd84a Commentary 2020-04-30 18:52:32 -04:00
Wilson Snyder
9fd4541069 Fix reduction OR on wide data, broke in v4.026, #2300. 2020-04-30 17:53:54 -04:00
Geza Lore
849487da23
Modify --build to be a standalone option (#2294)
- Issue an error when --build is used together with --make
- When given --build, always use GNU Make to perform the build
- Update documentation (examples were good as they were)
- Remove the broken t_flag_build_cmake test

Fixes #2280
2020-04-30 12:54:50 +01:00
Peter Horvath
dc64b43152
Fix xcode clang bug workaround (#2295) 2020-04-30 07:20:31 -04:00
Geza Lore
209a585a68 Remove VL_NEGATE_{I,Q,E}, use C native unary '-' instead
This is to avoid slowing down -O0 models unnecessarily.
2020-04-30 01:05:52 +01:00
Geza Lore
aa9cde22c8
Use SIMD intrinsics to render VCD traces (#2289)
Use SIMD intrinsics to render VCD traces.

I have measured 10-40% single threaded performance increase with VCD
tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render
the trace. Also helps a tiny bit with FST, but now almost all of the FST
overhead is in the FST library.

I have reworked the tracing routines to use more precisely sized
arguments. The nice thing about this is that the performance without the
intrinsics is pretty much the same as it was before, as we do at most 2x
as much work as necessary, but in exchange there are no data dependent
branches at all.
2020-04-30 00:09:09 +01:00
Wilson Snyder
faf7255e83 Commentary 2020-04-29 18:42:28 -04:00
Wilson Snyder
a6c97cc3bc Commentary 2020-04-29 18:41:00 -04:00
Wilson Snyder
977ceb404d Mention -march-native, #2289. 2020-04-29 18:10:43 -04:00
Wilson Snyder
b44efe7ef7 Use 'suggest' for consistent wording. 2020-04-28 21:19:19 -04:00
Wilson Snyder
15ad3f46be Fix logical not optimization with empty begin, #2291. 2020-04-28 21:15:20 -04:00
Wilson Snyder
c6d1a9858a Use clang-format 10.0.0 2020-04-28 18:47:59 -04:00
Wilson Snyder
77cc335c2d Commentary 2020-04-28 18:47:49 -04:00
Wilson Snyder
a95eb53684 Commentary (#2290). 2020-04-28 18:46:59 -04:00
Wilson Snyder
910803e6db Fix error on unpacked connecting to packed, #2288. 2020-04-27 18:38:54 -04:00
Geza Lore
dd967f7769 Improve trace buffer memory utilization and performance.
Convert trace buffer to 32-bit entries, rather than a union containing a
pointer type. Also tweaked trace entry layouts for a bit more
performance. This gains another 10% on SweRV EH1 CoreMark.
2020-04-27 19:00:17 +01:00
Geza Lore
b79ef672e1 Various minor optimizations of VCD trace routines
- Change templated trace routines to branch table.

Removed templating from trace chgBus and fullBus and replaced them with
a branch table like the other there is a very small (< 1%) penalty for
this on SwerRV EH1 CoreMark, but this is less than the variability of
disk IO so it's worth it to keep the code simpler and smaller.

- Prefetch VCD suffix buffer at the top of emit*

- Increase ILP in VCD emit* routines

- Use a 64-bit unaligned store to emit the VCD suffix (on x86 only)

The performance difference with these is very small, but the changes
hopefully make this code more performance-portable across various
micro-architectures.
2020-04-27 18:44:53 +01:00
Wilson Snyder
70549e1a64 Internals: Parse lifetime directives; still unsupported. 2020-04-26 12:45:06 -04:00
Wilson Snyder
87e1c36e4a Support event data type (with some restrictions). 2020-04-25 15:37:46 -04:00
Geza Lore
9991b19610 Another attempt at flushing threaded VCD correctly. 2020-04-25 18:40:09 +01:00
Geza Lore
1381d3dbde
Add -Wno-tautological-bitwise-compare to model build flags (#2285)
This appeases Clang 10.
2020-04-25 16:29:42 +01:00
Wilson Snyder
5e575f5906 Fix line numbers in tables. 2020-04-24 19:34:26 -04:00
Wilson Snyder
e2cbcd37d8 Commentary 2020-04-24 18:43:02 -04:00
Wilson Snyder
3b37b5b92d Tests: Check output from some unsupported tests. 2020-04-24 08:22:19 -04:00
Geza Lore
10b4678ee6 Make vgen.pl deterministic 2020-04-24 04:53:33 +01:00
Geza Lore
c1665818b9
Fix missing flush with threaded VCD tracing. (#2282)
VerilatedVcdC::openNext() failed to flush the tracing thread before
opening the next output file, which caused t_trace_cat.pl to fail
with --vltmt on occasion.
2020-04-24 03:09:26 +01:00
Geza Lore
27f4399c31
Fix tests failing on rerun after passing from clean. (#2281) 2020-04-23 21:27:06 -04:00
Wilson Snyder
df52e481fb Collected minor output code cleanups. 2020-04-23 21:22:47 -04:00
Wilson Snyder
f93ae707e0 Tests: Add bad option test. 2020-04-23 19:56:26 -04:00
Geza Lore
6ed10b7fde
Fix --protect-lib generated library link rules (#2279)
We used to include a .cpp file on the link line for the shared library,
which was ignored, but generated a .d file for the .so which contained
the header files required by the .cpp file. This then caused a rebuild
where we included the .d in verilated.mk to included in the .h headers
among the prerequisites of the .so, yielding a clang error about treating
.h files as c++-header rather than c-header... Long story short, we don't
do that anymore. This used t cause t_a4_examples to fail on occasion.

Note there is no need for a separate compilation rule for the
<--protect-lib>.cpp, as it will jsut pick up the standard OPT_FAST rule.
2020-04-23 17:30:23 -04:00
Geza Lore
8208fe8a0e
Fix test failures on Ubuntu 20.04 (#2278)
- Packaged SystemC lives in /usr so needed to update regex in test
driver
- Clang 10 complains about mixed named and positional initializers in
struct definitions.
2020-04-23 17:29:37 -04:00
Wilson Snyder
ace35b3e81 Tests: Add -G test. 2020-04-23 08:05:14 -04:00
Wilson Snyder
2b58e834ee Tests: Rename IVERILOG define for consistency. No functional change. 2020-04-23 08:05:14 -04:00
Qingyao Sun
14643643c9
Fix compatibility problem with CMake policy CMP0025 (#2277)
Signed-off-by: Qingyao Sun <sunqingyao19970825@icloud.com>
2020-04-23 07:14:20 -04:00
Wilson Snyder
7176aee852 Internals: Parse fork and delays, but then still report unsupported. 2020-04-22 21:31:40 -04:00
Geza Lore
c96a43b452
Fix unused variable in VL_READMEM_N (#2274) 2020-04-22 17:25:35 -04:00
Wilson Snyder
77915f78db Add experimental-only option. 2020-04-21 20:45:23 -04:00
Geza Lore
c52f3349d1
Initial implementation of generic multithreaded tracing (#2269)
The --trace-threads option can now be used to perform tracing on a
thread separate from the main thread when using VCD tracing (with
--trace-threads 1). For FST tracing --trace-threads can be 1 or 2, and
--trace-fst --trace-threads 1 is the same a what --trace-fst-threads
used to be (which is now deprecated).

Performance numbers on SweRV EH1 CoreMark, clang 6.0.0, Intel i7-3770 @
3.40GHz, IO to ramdisk, with numactl set to schedule threads on different
physical cores. Relative speedup:

--trace     ->  --trace --trace-threads 1      +22%
--trace-fst ->  --trace-fst --trace-threads 1  +38% (as --trace-fst-thread)
--trace-fst ->  --trace-fst --trace-threads 2  +93%

Speed relative to --trace with no threaded tracing:
--trace                                 1.00 x
--trace --trace-threads 1               0.82 x
--trace-fst                             1.79 x
--trace-fst --trace-threads 1           1.23 x
--trace-fst --trace-threads 2           0.87 x

This means FST tracing with 2 extra threads is now faster than single
threaded VCD tracing, and is on par with threaded VCD tracing. You do
pay for it in total compute though as --trace-fst --trace-threads 2 uses
about 240% CPU vs 150% for --trace-fst --trace-threads 1, and 155% for
--trace --trace threads 1. Still for interactive use it should be
helpful with large designs.
2020-04-21 23:49:07 +01:00
Wilson Snyder
6ab51de96d Fix dockerfile 2020-04-21 18:37:53 -04:00
James Hanlon
97cbc10925 Add --flaten for use with --xml-only (#2270). 2020-04-21 18:14:08 -04:00