1 of 17

sBerti: Enhancing Berti with a Smart Stride Prefetcher for Better Coverage

Jiapeng Zhou, Ben Chen, Kunlin Li, Yun Chen*

1

February 1, 2026

4th Data Prefetching Championship 2026

The Hong Kong University of Science and Technology (Guangzhou)

2 of 17

Challenges & Opportunities

2

February 1, 2026

  • New Traces from Graph, Google Traces v2 and AI-ML workloads
    • A prefetcher may excel on certain workloads (e.g., SPEC17) while degrading performance on others (e.g., AI-ML).
    • Finding new patterns not covered by existing prefetchers
  • New configuration: Limited bandwidth
    • Mechanisms for the prefetcher to automatically self-regulate based on real-time bandwidth

3 of 17

Profiling the baseline (L1 Berti + L2 Pythia)

3

February 1, 2026

  • Miss rate analysis
    • High L1D miss rate for AI-ML and Graph workloads
    • High L1I miss rate for GTraces v2

4 of 17

Profiling the baseline (L1 Berti + L2 Pythia)

4

February 1, 2026

  • L1D memory access trace analysis
    • Graph workloads: A[B[i]] accesses due to its CSR graph format
      • No load values in the traces and the prefetcher is not allowed to get the values in the Cache
    • AI-ML workloads: Stream access pattern (especially llama2.c)

5 of 17

Dive into the DPC3 Berti used in baseline

5

February 1, 2026

  • Accurate in predicting stream accesses
  • Misses occur every 4K page -> Missing cross-page prefetch opportunities

Time

Address

PF issued

Hit & PF useful

Miss

Incorporate stride prefetchers to cover the cross-page misses

6 of 17

Smart Stride Prefetcher

6

February 1, 2026

  • A stride prefetcher derived from the xs_stride prefetcher in XiangShan
  • Confidence directed stride pattern detection for accuracy
  • Depth adjustment based on late detection for timeliness
  • Strides are detected at multiples of the Delta to mitigate the OoO effect

7 of 17

sBerti = Berti + Smart Stride Prefetcher

7

February 1, 2026

  • Useful prefetches: 846 -> 995

Berti in DPC3

sBerti

PF issued

Hit & PF useful

Miss

8 of 17

Can vBerti cover the cross-page pattern?

8

February 1, 2026

  • Observation
    • Prefetch Queue is full most of the time. Prefetches with large strides (depth) are less likely issued.
    • The highest confidence is observed with stride 1.
    • Addresses in Prefetch Queue are stall
  • Still debuging on the issues

vBerti

PF issued

Hit & PF useful

9 of 17

sBerti Storage Overhead

9

February 1, 2026

  • Smart Stride Prefetcher: ~4KB
    • Stride detection table and a deduplication buffer
  • Berti: ~28KB
    • Reduce the number of the Recorded Pages to satisfy the 32KB budget in total

10 of 17

sBerti Coverage in FullBW

10

February 1, 2026

  • Slight L1D coverage increase for AI-ML and SPEC17 workloads

11 of 17

sBerti Accuracy in FullBW

11

February 1, 2026

  • Accuracy drop across all categories

12 of 17

sBerti Speedup over Berti

12

February 1, 2026

  • IPC speedup on workloads with over 50% stream access patterns

13 of 17

Speedup over baseline (Berti + Pythia)

13

February 1, 2026

  • Our solution: L1 sBerti + L2 Pythia
  • Slight performance gains on AI-ML(7%) and SPEC17(2%) workloads under FullBW configuration
  • Remain identical performance under LimitBW configuration

14 of 17

Conclusion

14

February 1, 2026

  • sBerti is a hybrid prefetcher of Berti and a Stride Prefetcher derived from XiangShan
  • sBerti optimizes the cross-page weakness of Berti in DPC3 and achieves ~7% speedup for AI-ML workloads
  • Future Work
    • Further optimization on sBerti for prefetch accuracy
    • Investigating vBerti to solve cross-page boundary problems

  • Special thanks to the reviewers for their constructive comments and to the DPC-4 organizers for their excellent work

15 of 17

sBerti: Enhancing Berti with a Smart Stride Prefetcher for Better Coverage�Jiapeng Zhou, Ben Chen, Kunlin Li and Yun Chen. DPC4.

15

February 1, 2026

Thanks for your attention!�Questions?

16 of 17

Debug log of vBerti

16

February 1, 2026

17 of 17

Debug log of vBerti

17

February 1, 2026