verilator/test_regress/t/t_trace_complex_structs_fst.out

369 lines
5.3 KiB
Plaintext
Raw Normal View History

$date
Wed Feb 23 00:01:19 2022
$end
$version
fstWriter
$end
$timescale
1ps
$end
$scope module top $end
$var wire 1 ! clk $end
$scope module t $end
$var wire 1 ! clk $end
$var integer 32 " cyc [31:0] $end
$scope struct v_strp $end
$var logic 1 # b1 $end
$var logic 1 $ b0 $end
$upscope $end
$scope struct v_strp_strp $end
$scope struct x1 $end
$var logic 1 % b1 $end
$var logic 1 & b0 $end
$upscope $end
$scope struct x0 $end
$var logic 1 ' b1 $end
$var logic 1 ( b0 $end
$upscope $end
$upscope $end
$scope union v_unip_strp $end
$scope struct x1 $end
$var logic 1 ) b1 $end
$var logic 1 * b0 $end
$upscope $end
$scope struct x0 $end
$var logic 1 ) b1 $end
$var logic 1 * b0 $end
$upscope $end
$upscope $end
$var logic 2 + v_arrp [2:1] $end
$var logic 2 , v_arrp_arrp[3] [2:1] $end
$var logic 2 - v_arrp_arrp[4] [2:1] $end
$scope struct v_arrp_strp[3] $end
$var logic 1 . b1 $end
$var logic 1 / b0 $end
$upscope $end
$scope struct v_arrp_strp[4] $end
$var logic 1 0 b1 $end
$var logic 1 1 b0 $end
$upscope $end
$var logic 1 2 v_arru[1] $end
$var logic 1 3 v_arru[2] $end
$var logic 1 4 v_arru_arru[3][1] $end
$var logic 1 5 v_arru_arru[3][2] $end
$var logic 1 6 v_arru_arru[4][1] $end
$var logic 1 7 v_arru_arru[4][2] $end
$var logic 2 8 v_arru_arrp[3] [2:1] $end
$var logic 2 9 v_arru_arrp[4] [2:1] $end
$scope struct v_arru_strp[3] $end
$var logic 1 : b1 $end
$var logic 1 ; b0 $end
$upscope $end
$scope struct v_arru_strp[4] $end
$var logic 1 < b1 $end
$var logic 1 = b0 $end
$upscope $end
$var real 64 > v_real $end
$var real 64 ? v_arr_real[0] $end
$var real 64 @ v_arr_real[1] $end
$scope struct v_str32x2[0] $end
$var logic 32 A data [31:0] $end
$upscope $end
$scope struct v_str32x2[1] $end
$var logic 32 B data [31:0] $end
$attrbegin misc 07 t.enumed_t 4 ZERO ONE TWO THREE 00000000000000000000000000000000 00000000000000000000000000000001 00000000000000000000000000000010 00000000000000000000000000000011 1 $end
$upscope $end
$attrbegin misc 07 "" 1 $end
$var logic 32 C v_enumed [31:0] $end
$attrbegin misc 07 "" 1 $end
$var logic 32 D v_enumed2 [31:0] $end
2019-05-01 23:18:45 +00:00
$attrbegin misc 07 t.enumb_t 4 BZERO BONE BTWO BTHREE 000 001 010 011 2 $end
$attrbegin misc 07 "" 2 $end
$var logic 3 E v_enumb [2:0] $end
$scope struct v_enumb2_str $end
$attrbegin misc 07 "" 2 $end
$var logic 3 F a [2:0] $end
$attrbegin misc 07 "" 2 $end
$var logic 3 G b [2:0] $end
$upscope $end
$var logic 8 H unpacked_array[-2] [7:0] $end
$var logic 8 I unpacked_array[-1] [7:0] $end
$var logic 8 J unpacked_array[0] [7:0] $end
$var bit 1 K LONGSTART_a_very_long_name_which_will_get_hashed_a_very_long_name_which_will_get_hashed_a_very_long_name_which_will_get_hashed_a_very_long_name_which_will_get_hashed_LONGEND $end
$scope module unnamedblk1 $end
$var integer 32 L b [31:0] $end
$scope module unnamedblk2 $end
$var integer 32 M a [31:0] $end
$upscope $end
$upscope $end
$upscope $end
$scope module $unit $end
$var bit 1 N global_bit $end
$upscope $end
$upscope $end
$enddefinitions $end
#0
$dumpvars
1N
b00000000000000000000000000000000 M
b00000000000000000000000000000000 L
0K
b00000000 J
b00000000 I
b00000000 H
b000 G
b000 F
b000 E
b00000000000000000000000000000000 D
b00000000000000000000000000000000 C
b00000000000000000000000000000000 B
b00000000000000000000000011111111 A
r0 @
r0 ?
r0 >
0=
0<
0;
0:
b00 9
b00 8
07
06
05
04
03
02
01
00
0/
0.
b00 -
b00 ,
b00 +
0*
0)
0(
0'
0&
0%
0$
0#
b00000000000000000000000000000000 "
0!
$end
#10
1!
b00000000000000000000000000000001 "
1#
1$
1%
1&
1'
1(
1)
1*
b11 +
b11 ,
b11 -
1.
1/
10
11
b11 8
b11 9
1:
1;
1<
1=
r0.1 >
r0.2 ?
r0.3 @
b00000000000000000000000011111110 A
b00000000000000000000000000000001 B
b00000000000000000000000000000001 C
b00000000000000000000000000000010 D
b111 E
b00000000000000000000000000000101 L
b00000000000000000000000000000101 M
#15
0!
#20
1!
2019-05-01 23:18:45 +00:00
b110 E
b00000000000000000000000000000100 D
b00000000000000000000000000000010 C
b00000000000000000000000000000010 B
b00000000000000000000000011111101 A
r0.6 @
r0.4 ?
r0.2 >
0=
0<
0;
0:
b00 9
b00 8
01
00
0/
0.
b00 -
b00 ,
b00 +
0*
0)
0(
0'
0&
0%
0$
0#
b00000000000000000000000000000010 "
Improve tracing performance. (#2257) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ | VCD | FST | FST separate thread | | (--trace) | (--trace-fst) | (--trace-fst-thread) | ------------+-----------------------------------------------------+ Before | 30.2 s | 121.1 s | 69.8 s | ============+==============+===============+======================+ After | 24.7 s | 45.7 s | 32.4 s | ------------+--------------+---------------+----------------------+ Speedup | 22 % | 256 % | 215 % | ------------+--------------+---------------+----------------------+ Rel. to VCD | 1 x | 1.85 x | 1.31 x | ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.
2020-04-13 23:13:10 +00:00
b111 F
b111 G
#25
0!
#30
1!
Improve tracing performance. (#2257) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ | VCD | FST | FST separate thread | | (--trace) | (--trace-fst) | (--trace-fst-thread) | ------------+-----------------------------------------------------+ Before | 30.2 s | 121.1 s | 69.8 s | ============+==============+===============+======================+ After | 24.7 s | 45.7 s | 32.4 s | ------------+--------------+---------------+----------------------+ Speedup | 22 % | 256 % | 215 % | ------------+--------------+---------------+----------------------+ Rel. to VCD | 1 x | 1.85 x | 1.31 x | ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.
2020-04-13 23:13:10 +00:00
b110 G
b110 F
b00000000000000000000000000000011 "
1#
1$
1%
1&
1'
1(
1)
1*
b11 +
b11 ,
b11 -
1.
1/
10
11
b11 8
b11 9
1:
1;
1<
1=
r0.3 >
r0.6000000000000001 ?
r0.8999999999999999 @
b00000000000000000000000011111100 A
b00000000000000000000000000000011 B
b00000000000000000000000000000011 C
b00000000000000000000000000000110 D
b101 E
#35
0!
#40
1!
2019-05-01 23:18:45 +00:00
b100 E
b00000000000000000000000000001000 D
b00000000000000000000000000000100 C
b00000000000000000000000000000100 B
b00000000000000000000000011111011 A
r1.2 @
r0.8 ?
r0.4 >
0=
0<
0;
0:
b00 9
b00 8
01
00
0/
0.
b00 -
b00 ,
b00 +
0*
0)
0(
0'
0&
0%
0$
0#
b00000000000000000000000000000100 "
Improve tracing performance. (#2257) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ | VCD | FST | FST separate thread | | (--trace) | (--trace-fst) | (--trace-fst-thread) | ------------+-----------------------------------------------------+ Before | 30.2 s | 121.1 s | 69.8 s | ============+==============+===============+======================+ After | 24.7 s | 45.7 s | 32.4 s | ------------+--------------+---------------+----------------------+ Speedup | 22 % | 256 % | 215 % | ------------+--------------+---------------+----------------------+ Rel. to VCD | 1 x | 1.85 x | 1.31 x | ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.
2020-04-13 23:13:10 +00:00
b101 F
b101 G
#45
0!
#50
1!
Improve tracing performance. (#2257) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ | VCD | FST | FST separate thread | | (--trace) | (--trace-fst) | (--trace-fst-thread) | ------------+-----------------------------------------------------+ Before | 30.2 s | 121.1 s | 69.8 s | ============+==============+===============+======================+ After | 24.7 s | 45.7 s | 32.4 s | ------------+--------------+---------------+----------------------+ Speedup | 22 % | 256 % | 215 % | ------------+--------------+---------------+----------------------+ Rel. to VCD | 1 x | 1.85 x | 1.31 x | ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.
2020-04-13 23:13:10 +00:00
b100 G
b100 F
b00000000000000000000000000000101 "
1#
1$
1%
1&
1'
1(
1)
1*
b11 +
b11 ,
b11 -
1.
1/
10
11
b11 8
b11 9
1:
1;
1<
1=
r0.5 >
r1 ?
r1.5 @
b00000000000000000000000011111010 A
b00000000000000000000000000000101 B
b00000000000000000000000000000101 C
b00000000000000000000000000001010 D
b011 E
#55
0!
#60
1!
2019-05-01 23:18:45 +00:00
b010 E
b00000000000000000000000000001100 D
b00000000000000000000000000000110 C
b00000000000000000000000000000110 B
b00000000000000000000000011111001 A
r1.8 @
r1.2 ?
r0.6 >
0=
0<
0;
0:
b00 9
b00 8
01
00
0/
0.
b00 -
b00 ,
b00 +
0*
0)
0(
0'
0&
0%
0$
0#
b00000000000000000000000000000110 "
Improve tracing performance. (#2257) * Improve tracing performance. Various tactics used to improve performance of both VCD and FST tracing: - Both: Change tracing functions to templates to take variable widths as template parameters. For VCD, subsequently specialize these to the values used by Verilator. This avoids redundant instructions and hard to predict branches. - Both: Check for value changes via direct pointer access into the previous signal value buffer. This eliminates a lot of simple pointer arithmetic instructions form the tracing code. - Both: Verilator provides clean input, no need to mask out used bits. - VCD: pre-compute identifier codes and use memory copy instead of re-computing them every time a code is emitted. This saves a lot of instructions and hard to predict branches. The added D-cache misses are cheaper than the removed branches/instructions. - VCD: re-write the routines emitting the changes to be more efficient. - FST: Use previous signal value buffer the same way as the VCD tracing code, and only call the FST API when a change is detected. Performance as measured on SweRV EH1, with the pre-canned CoreMark benchmark running from DCCM/ICCM, clang 6.0.0, Intel i7-3770 @ 3.40GHz, and IO to ramdisk: +--------------+---------------+----------------------+ | VCD | FST | FST separate thread | | (--trace) | (--trace-fst) | (--trace-fst-thread) | ------------+-----------------------------------------------------+ Before | 30.2 s | 121.1 s | 69.8 s | ============+==============+===============+======================+ After | 24.7 s | 45.7 s | 32.4 s | ------------+--------------+---------------+----------------------+ Speedup | 22 % | 256 % | 215 % | ------------+--------------+---------------+----------------------+ Rel. to VCD | 1 x | 1.85 x | 1.31 x | ------------+--------------+---------------+----------------------+ In addition, FST trace size for the above reduced by 48%.
2020-04-13 23:13:10 +00:00
b011 F
b011 G