verilator

mirror of https://github.com/verilator/verilator.git synced 2025-01-08 23:57:35 +00:00

Author	SHA1	Message	Date
Geza Lore	38a8d7fb2e	Remove redundant 'inline' keywords from definitions Also add checks to t/t_dist_cppstyle	2022-09-16 15:52:25 +01:00
Geza Lore	9ac64d0b92	Improve performance of MTask coarsening Various optimizations to speed up MTasks coarsening (which is the long pole in the multi-threaded scheduling of very large designs). The biggest impact ones: - Use efficient hand written Pairing Heaps for implementing priority queues and the scoreboard, instead of the old SortByValueMap. This helps us avoid having to sort a lot of merge candidates that we will never actually consider and helps a lot in performance. - Remove unnecessary associative containers and store data structures (the heap nodes in particular) directly in the object they relate to. This eliminates a huge amount of lookups and helps a lot in performance. - Distribute storage for SiblingMC instances into the LogicMTask instances, and combine with the sibling maps. This again eliminates hash table lookups and makes storage structures smaller. - Remove some now bidirectional edge maps, keep only the forward map. There are also some other smaller optimizations: - Replaced more unnecessary dynamic_casts with static_casts - Templated some functions/classes to reduce the number of static branches in loops. - Improves sorting of edges for sibling candidate creation - Various micro-optimizations here and there This speeds up MTask coarsening by 3.8x on a large design, which translates to a 2.5x speedup of the ordering pass in multi-threaded mode. (Combined with the earlier optimizations, ordering is now 3x faster.) Due to the elimination of a lot of the auxiliary data structures, and ensuring a minimal size for the necessary ones, memory consumption of the MTask coarsening is also reduced (measured up to 4.4x reduction though the accuracy of this is low). The algorithm is identical except for minor alterations of the order some candidates are added or removed, this can cause perturbation in the output due to tied scores being broken based on IDs.	2022-08-20 21:18:50 +01:00
Wilson Snyder	1e2219347e	Internals: Cleanup ifdef, move up not under compilver version ifdef	2022-08-11 17:41:43 -04:00
Geza Lore	96a4b3e5a5	Update clang-format config and apply - Regroup and sort #include directives (like we used to, but automatic) - Set AlwaysBreakTemplateDeclarations to true	2022-08-05 12:00:24 +01:00
Geza Lore	38e5b6c1ad	Replace __gcov_flush with __gcov_dump __gcov_flush was a private function and was removed from later GCC versions (at least from 11.2.0, possibly earlier). Replace with the documented public __gcov_dump.	2022-07-30 16:02:03 +01:00
Wilson Snyder	b9d7819faa	Internals: Fix some cppcheck issues. Some dump functions fixed.	2022-07-30 10:01:39 -04:00
Geza Lore	0de1bbc85b	Add and use VL_CONSTEXPR_CXX17	2022-07-05 14:21:28 +01:00
Geza Lore	b51f887567	Perform VCD tracing in parallel when using --threads (#3449 ) VCD tracing is now parallelized using the same thread pool as the model. We achieve this by breaking the top level trace functions into multiple top level functions (as many as --threads), and after emitting the time stamp to the VCD file on the main thread, we execute the tracing functions in parallel on the same thread pool as the model (which we pass to the trace file during registration), tracing into a secondary per thread buffer. The main thread will then stitch (memcpy) the buffers together into the output file. This makes the `--trace-threads` option redundant with `--trace`, which now only affects `--trace-fst`. FST tracing uses the previous offloading scheme. This obviously helps a lot in VCD tracing performance, and I have seen better than Amdahl speedup, namely I get 3.9x on XiangShan 4T (2.7x on OpenTitan 4T).	2022-05-29 19:08:39 +01:00
HungMingWu	880a9be3b1	Internal: Add C++20ish reverse_view for range loops. No functional change (#3388 ). Signed-off-by: HungMingWu <u9089000@gmail.com>	2022-04-18 13:03:56 -04:00
Wilson Snyder	e02f97854c	Deprecate 'vluint64_t' and similar types (#3255 ).	2022-03-27 15:27:40 -04:00
Wilson Snyder	3f7bf3d2dc	Fix MSVC localtime_s (#3124 ).	2022-03-27 13:59:18 -04:00
Geza Lore	b1b5b5dfe2	Improve run-time profiling The --prof-threads option has been split into two independent options: 1. --prof-exec, for collecting verilator_gantt and other execution related profiling data, and 2. --prof-pgo, for collecting data needed for PGO The implementation of execution profiling is extricated from VlThreadPool and is now a separate class VlExecutionProfiler. This means --prof-exec can now be used for single-threaded models (though it does not measure a lot of things just yet). For consistency VerilatedProfiler is renamed VlPgoProfiler. Both VlExecutionProfiler and VlPgoProfiler are in verilated_profiler.{h/cpp}, but can be used completely independently. Also re-worked the execution profile format so it now only emits events without holding onto any temporaries. This is in preparation for some future optimizations that would be hindered by the introduction of function locals via AstText. Also removed the Barrier event. Clearing the profile buffers is not notably more expensive as the profiling records are trivially destructible.	2022-03-27 15:57:30 +02:00
Xi Zhang	14d24213a8	Support LoongArch ISA multithreading (#3353 ) (#3354 )	2022-03-17 09:04:47 -04:00
Wilson Snyder	321880f5a6	Add trace dumpvars() call for selective runtime tracing (#3322 ).	2022-03-05 15:44:32 -05:00
Wilson Snyder	50094ca296	Internals: Add cpplint control file and related cleanups	2022-01-09 16:49:38 -05:00
Wilson Snyder	4cd56b1fb9	Use C++11 standard types for MacOS portability (#3254 ) (#3257 ).	2022-01-01 16:04:20 -05:00
Wilson Snyder	ca42be982c	Copyright year update.	2022-01-01 08:26:40 -05:00
Wilson Snyder	560b59f97f	Use C++11 standard types for MacOS portability (#3254 ).	2021-12-21 13:18:05 -05:00
Wilson Snyder	293a5f402b	Fix timescale portability on Arm64 (#3222 ).	2021-11-28 15:47:19 -05:00
Wilson Snyder	55da66164b	Fix verilator_gantt time on Arm.	2021-10-04 22:13:34 -04:00
Wilson Snyder	959793cde3	Internals: Cleanup VL_VALUE_STRING_MAX widths (#3050 ).	2021-08-23 21:13:33 -04:00
Wilson Snyder	3718fe1ca1	Commentary (trigger rebuild)	2021-05-13 18:34:20 -04:00
Yutetsu TAKATSUKASA	9797af0ad4	Introduce a macro VL_ATTR_NO_SANITIZE_ALIGN to suppress unaligned access check in ubsan (#2929 ) * Add VL_ATTR_NO_SANITIZE_ALIGN macro to disable alignment check of ubsan * Mark a function VL_ATTR_NO_SANITIZE_ALIGN because the function is intentionally using unaligned access for the sake of performance. Co-authored-by: Wilson Snyder <wsnyder@wsnyder.org>	2021-05-08 07:16:40 +09:00
HyungKi Jeong	0d6099b2b7	Fix MinGW not supportting 'localtime_r'. (#2882 )	2021-04-09 10:40:41 -04:00
Wilson Snyder	8992e2ec02	Commentary	2021-03-28 11:50:05 -04:00
Wilson Snyder	e9b5721fb0	Internals: Remove VL_FUNC as __func__ part of C++11	2021-03-28 11:14:51 -04:00
Wilson Snyder	ca01d6f18d	Internals: Add some std::'s. No functional change intended.	2021-03-26 21:23:18 -04:00
Wilson Snyder	2e158d88c1	Commentary. Remove dox comments from private members,	2021-03-20 21:11:53 -04:00
Wilson Snyder	a1ab295b74	Commentary: Cleanup all include/* header comments.	2021-03-20 17:46:00 -04:00
Wilson Snyder	8c3ad591ae	Internals: Add additional mutex exclusion checks. No functional change.	2021-03-06 18:29:11 -05:00
Wilson Snyder	47dcbd4b8a	Internal: Remove deprecated/insecure functions. No functional change intended.	2021-03-06 10:34:03 -05:00
Wilson Snyder	018d994781	Convert VPI to singleton, part of (#2660 ).	2021-03-04 19:23:40 -05:00
Wilson Snyder	be31fdcfe4	Use Google-style-guide header guard naming, to avoid __ prefix.	2021-03-03 21:57:07 -05:00
Wilson Snyder	8c2ee6c5ab	With -DVL_NO_LEGACY hide all outdated API routines	2021-02-22 22:59:23 -05:00
Wilson Snyder	bd602d0e2d	Copyright year update	2021-01-01 10:29:54 -05:00
Wilson Snyder	941e5c659a	Fix cppcheck parse error	2020-12-23 15:22:02 -05:00
Wilson Snyder	b6ded59c2b	Internals: Use and enforce class final for ~5% performance boost.	2020-11-18 21:32:16 -05:00
Wilson Snyder	698e0fbbd1	configure: Try compiler flags to get to C++11 (#2502 )	2020-08-17 07:40:07 -04:00
Wilson Snyder	ee9d6dd63f	C++11: Favor auto, range for. No functional change intended.	2020-08-16 11:44:06 -04:00
Wilson Snyder	c0127599df	C++11: Use nullptr. No functional change.	2020-08-16 11:44:05 -04:00
Wilson Snyder	7c54a451a9	C++11: Remove pre-c11 VL_OVERRIDE etc. No functional change.	2020-08-16 11:44:05 -04:00
Wilson Snyder	f3b28c5c74	Remove configure --enable-prec11-final	2020-08-15 09:39:59 -04:00
Wilson Snyder	b1495f0742	Commentary (#2423 )	2020-06-14 10:44:57 -04:00
Wilson Snyder	c5d61da5d2	Internal coverage: Fix coverage of tests that abort. No functional change intended.	2020-06-05 08:00:22 -04:00
Wilson Snyder	b60a92eed3	Fix pre-C11 compiler warning.	2020-05-30 21:11:37 -04:00
Wilson Snyder	4cfa3f879a	Internals: Allow VL_DANGLING on pointer const.	2020-05-29 18:31:53 -04:00
Wilson Snyder	5089ac6119	Remove VL_ULL as ULL now in MSVC & C++11	2020-05-28 20:32:07 -04:00
Wilson Snyder	9fd4541069	Fix reduction OR on wide data, broke in v4.026, #2300 .	2020-04-30 17:53:54 -04:00
Geza Lore	aa9cde22c8	Use SIMD intrinsics to render VCD traces (#2289 ) Use SIMD intrinsics to render VCD traces. I have measured 10-40% single threaded performance increase with VCD tracing on SweRV EH1 and lowRISC Ibex using SSE2 intrinsics to render the trace. Also helps a tiny bit with FST, but now almost all of the FST overhead is in the FST library. I have reworked the tracing routines to use more precisely sized arguments. The nice thing about this is that the performance without the intrinsics is pretty much the same as it was before, as we do at most 2x as much work as necessary, but in exchange there are no data dependent branches at all.	2020-04-30 00:09:09 +01:00
Wilson Snyder	df52e481fb	Collected minor output code cleanups.	2020-04-23 21:22:47 -04:00

1 2 3

149 Commits