1 of 16

Imprecise Store Exceptions

Siddharth Gupta Yuanlong Li Qingxuan Kang�Abhishek Bhattacharjee Babak Falsafi Yunho Oh Mathias Payer�midgard.epfl.ch

2 of 16

Memory Capacity is Increasing

2

Future servers will have access to PBs of memory capacity

Memory Capacity

Technology

GBs

~100 GBs

~10 TBs

PBs

3 of 16

Virtual Memory (VM)

Address translation + permission check

  • On the critical path of memory accesses

TBs-PBs of memory capacity in servers

  • Cannot increase the TLB capacity
  • 1000 entries ~16KB of SRAM

3

ST

ST

LD

ST

Performance bottleneck at the core for high memory capacity

Permission�Check

~1000 TLB entries

$

Memory

4 of 16

Previous Proposals

Virtual cache hierarchies

Intermediate address spaces

  • Midgard [ISCA’21]
  • VBI [Hajinazar, ISCA’20]

4

ST

ST

LD

ST

Page Table Walk

Proposals to push address translation towards memory

$

Memory

~1000 TLB entries

Permission�Check

5 of 16

Late Exception Detection

Store instructions can retire before completion

  • Exception handling obstructed
  • Identified by Qiu and Dubois in ISCA’99

Precise handling requires too much state

  • 10-20KB per core
  • Nullifies the benefit from TLBs

5

Memory consistency model bars precise exception handling

ST

ST

LD

ST

$

Memory

Permission�Check

ST

Page Table Walk

6 of 16

This Paper: Imprecise Store Exceptions

Retired-but-incomplete stores are pending

Handle store exceptions imprecisely

6

$

Exceptions triggered imprecisely on an unrelated instruction

ST

Page Table Walk

Memory

Permission�Check

7 of 16

Contributions

OS + microarchitecture codesign with formalism

  • Exceptions are infrequent performance not the primary concern
  • Correctness is! Formalism to ensure that design is memory model compliant

Key results

  • End-to-end evaluation with a full-system RISC-V prototype
  • Negligible silicon to maintain original performance
  • Codesign for VM does not impact application programmers

7

OS + microarchitecture codesign with formalism for exception handling

8 of 16

Reason for Imprecision

Out-of-order cores retire stores early

  • After data and address is confirmed
  • The store need not have completed
    • Pipeline can move forward
    • Key for relaxed memory consistency models
  • Store exceptions might be detected after retirement

8

Exceptions generated after store retirement cannot be handled precisely

Load Store�Queue

Store�Buffer

Page Table Walk

ST

Core

Page Fault

Retire

9 of 16

Imprecise Store Exceptions: Design

Problem: Stores pending in the store buffer

  • Cannot be written to the cache hierarchy
  • Exception resolution might take tens of ms
  • Store buffer is full = core is stalled!

Proposal: Hand off stores to the OS

  • Record stores (data, address, order) in a pre-allocated memory region
  • OS reads the stores and performs them after exception resolution

9

Microarchitecture will record, OS will replay the stores

10 of 16

Implementation

Microarchitecture records stores in an in-memory queue

  • Stores directly impacted by the exception
  • Stores stuck because of memory consistency requirements
  • E.g., save all younger stores in Processor Consistency

OS reads the recorded stores in the exception handler

  • Replay stores in order after exception resolution
  • Resume application

10

Simple per-core controllers to record stores

ST

ST

ST

ST

µArch

In-Memory Queue

OS

11 of 16

Correctness Through Formalism

Does the final design comply with the memory consistency model?

  • We can prove it formally!
  • Developed memory consistency formalism for the design
  • Proved that imprecise exceptions can comply with popular models

11

Formalism is essential for proving compliance with the memory model

Proofs in the paper

12 of 16

Not Just Virtual Memory!

Imprecise exceptions might be caused by

  • Any logic that interacts with stores
  • Cannot generate the required cache block
  • Events such as page faults
  • E.g., accelerators such as täkō [Schwedock, ISCA’22]

12

täkō

Load Store�Queue

Store�Buffer

ST

Core

Page Fault

Retire

13 of 16

Evaluation

Full-system prototype with FireSim

  • XiangShan [Xu, MICRO’22]
  • 2x RISC-V out-of-order cores
  • Linux 5.15

Results

  • End-to-end evaluation no performance impact
  • Correctness through memory consistency litmus tests
  • Negligible silicon overhead

13

Artifact available on:

github.com/parsa-epfl/imprecise_store_exceptions

14 of 16

Conclusion

  • High-capacity memory hierarchies have a performance bottleneck due to VM
  • Recent proposals delay address translation leading to imprecise exceptions
  • We propose OS + microarchitecture codesign to handle exceptions
    • Developed formalism for OS developers to ensure correctness
    • Full-system RISC-V prototype shows correctness and performance
    • Application-level changes not required for VM exception handling

14

15 of 16

15

16 of 16

Memory Consistency Litmus Tests

16

Ordering relation

Cases covered

Dependencies

2366

Program order (same location)

368

Preserved program order

733

External read-from order

1544

Internal read-from order

1304

Coherence order

747

From-read order

976

Barriers

1581

All litmus tests pass on the RISC-V prototype

RISC-V Litmus Tests [Alglave, CAV’10]

  • Automatically generates relevant cases
  • Tests the weak memory model
  • Runs on top of Linux