1 of 78

Is this memory safety here in the room with us?

Halvar Flake / Thomas Dullien

DistrictCon 0 2025

2 of 78

Why memory safety?

�The 40000 foot view.

3 of 78

Why do people write software?

They want to solve a concrete problem by means of a (finite state) machine (or more precisely transducer - a finite-state machine with output)
They do not have the machine to solve it
They write software that emulates the finite state machine they want on a general-purpose CPU

What I need

What I have

4 of 78

Abstract view

Our software is an “intended” FSM emulated on a real-world CPU - the CPU has many more states, but our intent is to restrict it to those that “make sense” as FSM states.

5 of 78

An unintended state is entered

An event triggers a transition into a state that is “nonsensical” or “unintended” from when viewed through the FSM lens.

6 of 78

Trying to transition as if it was a sane state

Further events have the software attempts to transition to the new FSM state (see red arrow), but the state is “broken”.

7 of 78

The weird machine

Transforming a broken state leads to a new broken state.

8 of 78

The weird machine

Attackers can continue driving the machine into new states, possibly reaching “all states” (or at least many that violate expected security properties).

9 of 78

Nested state spaces in computing

Possible physical states of the computational device

Observable states of the computational device

Documented possible states of the computational device

“Sane” states of the computational device running the software

10 of 78

Program execution should follow trajectories through “intended”, “sane” states

11 of 78

During exploitation, a state outside the “intended”, “sane” set of states is reached

12 of 78

The attacker carefully controls the trajectory through those “weird” states

13 of 78

Memory safety attempts to put an extra “wall” into this diagram.

“Memory safe” states.

14 of 78

More states of the machine reachable

Fewer states of the machine reachable

Small FSM

RegExp

Java/Go

Safe Rust

C/C++

Unsafe Rust

Assembly

15 of 78

What does memory safety provide?

In C/C++, pointers and array indices are exchangeable
They are also special in a particular way:

Corrupting a pointer or array index or writing to it after memory has been released

throws nearly all statements about the state of the machine out of the window.

16 of 78

Corrupt memory tends to letting the unicorns escape

”Here be unicorns”

17 of 78

Why is memory corruption special?

In C/C++, corrupting a pointer or array index risks letting the unicorns escape and trample all over all program states.

A corrupted pointer (or index) can alias to pretty much anything. Writing to it breaks the relationship between source code and program states.

Business logic variables do not behave like that.

18 of 78

What does memory safety provide?

If memory safety is maintained, the abstract machine that the language defines stays intact in the presence of most other bugs.

A link between the language syntax (and the source code) and behavior of the machine is maintained.

A horse stays a horse and does not grow wings and a horn.

19 of 78

Example: Graph of variables assignments

Take your memory-safe input program
Draw a graph with edges derived from assignments in the source

typeof(LHS) <—- typeof(RHS)

This graph stays intact no matter how bizarre the values in the business logic variables become.
You can still reason about what is going on from the source

20 of 78

How is memory safety usually achieved?

21 of 78

Memory safety is commonly viewed as two components

Spatial memory safety: You cannot access outside of array boundaries.�
Temporal memory safety: You cannot access memory after it has been released or before it has been allocated.

In theory, you could prove for a given C/C++ program that it satisfies these properties. In practice for most codebases, this isn’t done, so languages (or hardware) are modified to have safety mechanisms.

These safety mechanisms can be implemented in runtime, during compile-time, or a combination of both.

Application logic can still become arbitrarily confused. But the goal is to avoid such a confusion to ever allow the dereference of a corrupt pointer.

22 of 78

Obtaining spatial safety

Spatial safety is usually (not always) obtained through the following steps:

Array accesses go through a centralized chokepoint (e.g. a an Array class or somesuch).
Run-time checks on the centralized chokepoint ensure that indices into the array are within bounds.
Remove pointer arithmetic. You can have a pointer to an object, even to one in an array, but you cannot modify it.
If you need pointer arithmetic, the assumption is that you have an underlying array of identical objects. You can have a span or a slice that you can index into which references the same underlying memory.
Reduce the need on the language level to juggle indices by having “for each element in container”-style loop structures.�

These safety mechanisms are implemented in a combination of runtime and compile-time.

23 of 78

Obtaining temporal safety

There are different approaches for obtaining temporal safety. The common ones are:

Garbage collection: Remove the ability to manually free objects, and have a garbage collection algorithm collect objects whose lifespan is over.
Reference counting: Perform reference counting on data structures, and free them when the last one hits zero.
Strict ownership: Every dynamic allocation is “owned” by an object; enforces a tree structure of the heap. Memory gets freed when a parent goes out of scope. Only single mutable reference.

All of these approaches require the coordination between the compiler/interpreter and the runtime.

24 of 78

The different flavors of memory safety

Memory safety by proving absence of out-of-bounds memory access using whole-program analysis (ASTREE etc.)
Memory safety by using garbage collection (Go, Java, C#, Python etc.) and runtime bounds checks
Memory safety through reference counting (Swift) and runtime bounds checks
Memory safety through a type system with strict ownership, lifetimes and runtime bounds checks (Rust, SafeCPP proposal).
(Memory safety through C++ profiles, ”21st century C++”)

25 of 78

Flavor 1: Whole-program analysis

26 of 78

Flavors of memory safety: (1) Proving absence

Memory safety can be proven for real-world C/C++ control systems
Caveats:

Subset of C/C++
Typically no dynamic memory allocations

In real-world use for critical control systems with hard real-time requirements
Aerospace, Automotive, Nuclear, Space systems�
Very rarely applied to existing codebases, usually impractical to do so – codebase needs to be designed from scratch to pass analysis.
Extensive use of annotations in the codebase to help the static analyzer verify properties.
Development cost much higher than the rest of the industry.

27 of 78

ASTREE and Airbus Avionics

Memory-safe concurrent C.
No dynamic memory allocations.
Code co-developed along with the static analyzer ASTREE to verify absence of runtime errors (which includes out-of-bounds memory access).

28 of 78

Benefits of this approach:

Performance. All your safety properties are verified statically before compilation. No unnecessary run-time enforcement.�
Provable (assuming known hardware semantics etc.).�
Comparatively few assumptions on correct runtime implementation.

29 of 78

Downsides

Expensive. Very little real-world software can pass such verification.�
Cumbersome. Most real-world systems make significant use of dynamic memory allocation, linked data structures etc., all of which can still make static property proving hard.

30 of 78

Flavor 2: Garbage Collection�and runtime array checking

31 of 78

GC and runtime bounds checking

Restrict or abolish the use of pointers for array indexing.

Java - go full reference semantics.
Go - go reference semantics for pointers, introduces slices for pointers-to-array-ranges.

Array indexing now happen through array interfaces vs. raw pointers.
Obtain spatial safety by introducing runtime bounds checks on these indices.
Obtain temporal safety by removing manual release of memory.
Run algorithms that analyze the heap graph to determine whether memory can be released or not.

32 of 78

Garbage collection: Java, C#, Go, Python etc.

1960 - McCarthy and Collins for LISP
Huge success story - few new languages did not adopt it - ActionScript, Erlang, Groovy, Objective-C, Python, C#, Java, Lua, PostScript, Go, JS, Ruby, VB.�
Until the emergence of Rust, “memory safe language” was (falsely) used as synonymous with “garbage collected language”.�
But … TANSTAAFL

33 of 78

The most important wall of our time

The memory wall. Why linked lists suck. Cache rules everything around me.

34 of 78

Cost of garbage collection

Hardware evolution has “moved against” GC.�
When GC was invented, memory access was ~1-5 cycles. Now it’s 100s of cycles.�
“Liveness is a global property, free’ing a local one”�
Traversal of the global heap is necessary to perform GC. This is a pessimal workload for modern systems. Lots of DRAM references. All dependent on each other (speculative computing ahead of DRAM becomes too hard). Expensive:

The cost of traversing the heap graph
The cost of polluting cache lines by not re-using memory more
The cost of consuming more memory (in terms of both $ and power)

35 of 78

Hertz/Berger 2005: GC heap size vs. perf tradeoff

Cycle equivalence at 5x RAM consumption.

More than 50% more cycles at 2x RAM consumption

36 of 78

GC everywhere: Do I pay more DRAM or more cycles?

Napkin math:

Global datacenter energy draw is ballpark ~$100bn annually (1.1 pWh * 0.083 per kWh). DRAM draws between 25%-30%, CPU around 60%.
If I 5x or 3x available DRAM, I’ll get close to doubling power consumption.
Power cost of GC-based memory safety in production may be anywhere from a few billion to tens of billions annually (not counting Capex for hardware purchase and software rewrites).

37 of 78

Quick note on language design and GC: Go vs. Java

Java has a very mature and sophisticated GC infrastructure with decades of production experience.
Go has a very naive and simple GC.
Why is in-production performance approximately the same?
Go’s language design allows programmers to explicitly perform stack-based allocations.
Java doesn’t, leaving the work to move allocations to the stack to the optimizer.
Many more objects make it to the heap.
Go allows arrays-of-piece-of-data, reducing heap fragmentation.
Language design and idiomatic use matter.

38 of 78

Flavor 3: Reference counting�and runtime array checking

39 of 78

Reference counting

Apple is religious about battery life (example: ECC to extend it)
DRAM refresh is a huge component of total power draw of idle devices
If you are religious about battery life, you need to be religious about driving down DRAM usage.
Design decision for Swift: Reference counting in place of garbage collection.�
Achieves temporal safety even without a garbage collector, and achieves the upper bound of manual memory management performance (e.g. at least as good as the worst manual memory manager).

40 of 78

Pro/Cons of reference counting

Pro: Compact heap.

Keeps memory overhead low. You don’t pay power to keep your garbage alive in DRAM.

Con: Synchronization performance hit.

When traversing data structures in concurrent setups, naive implementations turn read operations into read/write.
Refcount updates need to be synchronized across cores in concurrent access.
Refcount takes up space, average objects are 20-60 bytes, so 5-10% overhead.

41 of 78

Flavor 4: Strict ownership semantics, lifetimes, and runtime array checking (Rust)

42 of 78

Rust’s big contribution

“memory safety” used to be synonymous with “garbage collection”.
Rust introduces a number of design choices that achieve production-grade memory-safety without the GC overhead:

Strict ownership: Either one mutable reference to a variable exists, or multiple immutable ones. This removes use-after-frees, but more generally mutating data structures while simultaneously iterating.
Lifetime annotations: When returning a reference derived from other references, the valid lifetime of that reference depends on the lifetime of the source references.

Game changer: Memory safety without the GC drawbacks for latency, memory consumption, locality etc.

43 of 78

“Rewrite it in Rust”

Rust as a language got multiple things right:

Allow the type system to prove temporal safety at compile time.
Runtime checks for arrays can be removed by compiler optimizations in many (though not all) cases.
Unsafe behavior is permitted but explicitly encapsulated.
Great FFI allows gradual rewrite of C++ code to Rust.

Learning curve: Borrow checker acts like a very very stringent code reviewer and forces particular program structures that are easy to reason about at compile-time.

44 of 78

Industry traction

Rust has gained a lot of industry traction in domains that stuck with C/C++.
AWS, parts of Android, even some Linux Kernel drivers.
Startups such as Oxide that build low-level firmware stuff in Rust.
Initially there were even attempts to rebuild a browser engine (Servo) and a JS engine in Rust.
With Mozilla running out of funds, those attempts have stalled somewhat (although they are still alive)

45 of 78

Pro/Cons of strict ownership semantics

Pros:

Compact heap: Memory gets released when it ceases to be reachable.
Little overhead: Temporal safety arises from compile-time enforcement, meaning no run-time overhead to obtain it.
Concurrency: Safety for concurrency can be obtained compile-time too.
Contained un-safety: Unsafe is available, but contained.

Con:

Enforced heap structure: Not all problems lend themselves easily to the ownership semantics.
Directed cyclic graphs or RCU-like structures are impossible or near-impossible in safe Rust.
Learning curve: Many developers dislike arguing with the borrow checker.

Rust forces specific architectural choices on the programmer.�These are often, but not always, the right choices for the task.

46 of 78

(Flavor 5: C++ safety profiles and 21st century C++)

47 of 78

“21st century C++”

B. Stroustrup lays out roadmap for getting memory safety (and resource safety) into C++.
Extra annotations to enforce language rules per-translation-unit:
[[profile::enforce(type)]] // no casts or uninitialized objects in this TU
[[profile::enforce(bounds)]] // all derefs bounds-checked, no pointer arithmetic
[[profile::enforce(lifetime)]] // …
Ability to disable for individual lines:
[profile::suppress(lifetime))] this->succ = this->succ->succ;

48 of 78

Pro/Cons of 21st century C++

Pro: Backward compatibility, incremental porting.

Allows incremental porting of existing C++ to memory safety.

Con: Only exists on paper.

Right now, the approach only exists on paper.
Interesting direction, given the amount of existing C++ code.
Even on paper, this approach does not appear to be technically feasible. (See criticism from the SafeCPP author, Sean Baxter).

49 of 78

Current hardware approaches: �MT and CHERI

50 of 78

Hardware approaches: MT

Historically, most memory safety approaches were software-only.

Over the last few years, memory tagging has entered the discussion (and even implementation), which allows a limited, probabilistic form of memory-safety to be hardware-enforced.

MT modifies malloc to “tag” memory (using special instructions) with a few bits of “tag”. These bits get also stored in the 64-bit pointers ignored by the architecture (usually 57 through 63).

On memory dereference, these bits are compared (by the hardware) to the tag, and an exception is raised when they don’t match.

Relatively easily retrofitted to existing systems (but DRAM cost!)

51 of 78

Hardware approaches: CHERI

Custom CPU cores (historically MIPS, now RISC-V) with capabilities.

Fat pointers with bounds and permissions encoded.

52 of 78

Honorable mention: MiraclePtr

Retrofitting UAF safety into Chrome via adding refcounting - more mitigation than mem safety. Doesn’t help against iterator invalidation etc.

53 of 78

wipe sweat off brow

54 of 78

Observations

55 of 78

Local reasoning vs. global problems

Big wins come from turning global problems into locally-solvable ones:

Global nature of determining liveness is a problem.

Security problems when mismanaged (memory corruption), or very costly when automatically managed (garbage collection).

Using a more powerful type system to turn global problems into locally checkable problems seems to work.

56 of 78

Local reasoning vs. global problems

If you squint, you are proving local properties on each function, and then composing local proofs into a whole-program proof of safety.

Copious annotations (in the form of types) are needed to make the proofs work.

As the type-checker (theorem prover) becomes more powerful, fewer annotations are needed (lifetime elision).

In the limit, the type system approach and the program analysis approach converge, from different sides.

Rust is already adding a Prolog-style theorem prover (Chalk) to the compiler to deal with implications in the type system.

57 of 78

TANSTAAFL

None of the approaches is truly “free”.
Computational cost when doing GC.
Architectural restrictions on code and exclusion of certain high-performance data structures and patterns in safe Rust.

58 of 78

Where does this leave us?

59 of 78

Building a memory-safe userspace network application (such as an SMTP server etc.) is a solved problem.

60 of 78

We can write memory safe userspace services

Writing userspace network services can be done in a memory-safe manner.
Examples: Code such as an SMTP server can be implemented reliably and safely in either GC’ed languages or something like Rust.
Caveat: Runtime bugs, and willingness to pay cost.

Great, we are safe then?

Wait, we don’t really have a web browser, a JS engine, etc.
Also, many device drivers lead to kernel privilege escalations in spite of being “memory safe” when examined at the language level.

61 of 78

What is not (yet?) covered by existing mechanisms?

62 of 78

Writing safe unsafe Rust is not easy

The Rustonomicon starts with a big warning about being outdated.
There is no authoritative complete source of safe unsafe Rust.
It is very very easy to write unsafe unsafe Rust.
A good understanding of the Rust type system is needed.

Example of a common mistake: Making a Rust container that permits Send/Sync (e.g. sharing between threads) but does not require the type it contains to permit Send/Sync
Programmers can use this container in an unsafe manner from safe Rust now.

63 of 78

Shared memory TOCTOU

Memory safety through type systems is a “local” property.
When two trust domains access the same shared memory, one side can be memory-safe but the other side can invalidate assumptions (by modifying the shared state nefariously or accidentally).
Classical example: Kernel memory corruption from dereferencing userspace pointers more than once.
Extremely common primitive for privilege escalations.

Also relevant when CPU, Baseband, NIC, GPU don’t trust each other.

64 of 78

Surprising callbacks out of the type system

Classical C++ browser bug pattern:

Take a reference to an object.
Read a value from the object.
Perform a check on a value from the object.
Inadvertently cause a callback into Javascript. Javascript mutates the underlying object / heap.
Return into C++ in an incoherent state, corrupt memory.

If you rely on your type system to provide memory safety, the invariants of the type system need to be kept intact by any other language you call into.

This is conceptually a variant of TOCTOU shared-memory bugs, in some sense.

65 of 78

Issues around FFI (and GC and type systems)

Subtleties about FFIs and memory-safe languages can fill a book.

Memory allocated by the GC runtime might move around
What happens to pointers passed to non-GC code?
Go also effectively disables heap randomization - becomes a problem in mixed code with vulnerabilities?

Dynamic linking in Rust was historically a nightmare:

BTreeMap example: https://github.com/rust-lang/rust/pull/63338

66 of 78

Array indices as proto pointers

Ok, I want to implement a dynamically-changing cyclic graph in Rust that needs to be accessed by multiple threads.
This is a classical problem when implementing a DOM, but for many other use-cases, too.
What do I do?
Two approaches:

Allocate a pool of nodes, use indices as proto-pointers (petgraph).
User Rc<T> and Arc<T> (reference counting wrappers), pay overhead and risk memory leaks.

Both have their advantages and disadvantages.

67 of 78