1 of 38

rustc

Mark Mansi

2 of 38

Outline

  • Fast Rust primer
  • rustc Architecture
  • Other Considerations for Compiler Builders

3 of 38

Key Takeaways

  • PL design choices impact Compiler design, architecture, performance

  • Compiler design involves more than generating binaries

4 of 38

Very Fast Rust Primer

5 of 38

Rust Primer

  • Low-level, high-perf, no language runtime (like C/C++)
  • Originally targeted systems programming

  • Focus on memory safety/correctness + practicality
  • Statically, strongly typed
  • Secret sauce: lifetime/borrow checking for references

6 of 38

pub fn main () {

// Variable declarations...

let foo = 30;

let mut bar = 50;

// if-else

if foo == bar {

bar += 30;

}

// Standard data structures

// and type inference

let mut v = Vec::new();

v.push(bar);

v.push(foo);

// Loops

for item in v.iter() {

println!(“Hello World {}”, item);

}

}

7 of 38

enum Color {

Red,

Green,

Blue,

Grey(u8),

}

struct Dog {

name: String,

color: Color,

}

trait Pet {

fn greet(&self) -> String;

}

impl Pet for Dog {

fn greet(&self) -> String {

// Last expr is return value

format!(

“Good dog, {}”,

self.name

)

}

}

8 of 38

let mut v = Vec::new()

… // Add elements to v

for item in v.iter() {

v.push(5);

}

pub fn push(&mut self, new: T)

pub fn iter(&self) -> Iter<'_, T>

error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable

--> src/main.rs:5:9

|

4 | for item in v.iter() {

| --------

| |

| immutable borrow occurs here

| immutable borrow later used here

5 | v.push(5);

| ^^^^^^^^^ mutable borrow occurs here

9 of 38

Rust Primer: More resources

The Rust Programming Language Book: https://doc.rust-lang.org/book/

Help Forum: https://users.rust-lang.org

10 of 38

rustc Architecture

11 of 38

rustc Architecture: Overview

  • Rust-specific frontend, LLVM backend
  • DAG of queries instead of passes
  • Many static analysis passes
  • 5 code representations: AST, HIR, THIR, MIR, LLVM IR
    • Each suited for some purpose
    • Progressively more low-level

Rust

LLVM

12 of 38

Queries instead of Stages!

  • Directed-acyclic graphs of queries
    • Except lexing/parsing, macro expansion, name resolution, LLVM
  • Queries at the granularity of “bodies”

type_check_crate()

list_of_all_hir_items()

hir(foo)

type_of(foo)

type_check_item(foo)

hir(bar)

type_of(bar)

type_check_item(bar)

13 of 38

Queries instead of Stages!

Why? Incremental Compilation!

  • Queries memoized/cached
  • Serialize cache to disk
  • Reuse for next compilation
  • Stable hashing

  • Future work: parallelization

14 of 38

rustc Task List

  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)
  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization

trait Foo { }

fn baz<T: Foo>(t: T) { }

baz(3);

baz(“hello”);

let x = 5;

match x {

0 => { }

1 => { }

2 => { }

}

std::vec::Vec<T>

Vec<u64>

Vec<String>

15 of 38

rustc Task List

  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization
  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)

AST

  • Represents program syntactically
  • Easy to create from source code
  • Amenable to syntactic manipulation of program

16 of 38

rustc Task List

  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization
  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)

AST

HIR

  • Desugared, compiler-friendly AST
  • Easy to build from AST
  • Suitable for type/trait inference/checking

17 of 38

rustc Task List

  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization
  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)

AST

HIR

THIR

  • Typed version of HIR
  • Easy to build from HIR, easy to convert to MIR
  • Suitable for pattern exhaustiveness checking

18 of 38

rustc Task List

  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization
  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)

AST

HIR

MIR

THIR

  • Control-flow graph representation of program
  • Suitable for control-flow-based analyses
    • Borrow checking
    • Dataflow for correctness checks and opts
  • Easy to monomorphize, optimize, generate code

19 of 38

20 of 38

rustc Task List

  • Command line args, user environment, toolchain, load compilation cache
  • Lexing, Parsing
  • Macro expansion, feature gating, various compiler magic
  • Type inference
  • Type checking
  • Trait solving/checking
  • Pattern exhaustiveness checking
  • Borrow checking
  • Constant evaluation
  • Rust-level Optimizations
  • Monomorphization
  • Saving compilation cache
  • LLVM IR generation
  • LLVM (more optimization, binary generation, linking, link-time optimization, …)

AST

HIR

MIR

THIR

LLVM IR

  • Standardized, can be passed to LLVM
  • Typed assembly code with side-effect info
  • Good for optimizing, code generation
  • Language agnostic

21 of 38

PL design -> Compiler performance

  • Grammar parsing complexity
  • Monomorphization
  • Expressive type system
  • Constant evaluation
  • Extensive static analyses
  • Local vs global analyses

22 of 38

PL design -> Compiler design

  • No header files -> Queries/incrementation compilation
  • Nested function definitions -> Name resolution needs to search more scopes
  • Borrow checking -> Cannot generate code until late in compilation
  • Borrow checking -> Need IR that represents control flow info
  • Monomorphization -> Want to do all analyses on generic code
  • Traits -> More advanced features require logic solver (experimental)

23 of 38

Non-textbook Choices

  • Queries instead of stages/passes
  • Grammar is not LR/LL (maybe is context-free?)
    • Worst-case asymptotic perf is bad, but in practice is ok
    • Can optimize cases people run into
    • Instead want flexibility to vary lookahead, handle special cases
  • Hand-written recursive descent parser (not parser generator)
    • Better error messages
  • So many IRs
    • Worse performance, memory usage
    • Enables many powerful static analyses, including borrow checker

24 of 38

Other Considerations For Compiler Builders

25 of 38

Besides Correctness: Development and Debugging

  • UI/UX: error messages, linting
  • Debug info
    • Name mangling
    • Existing tooling C/C++ oriented (enums, closures, traits)

26 of 38

Besides Correctness: Developer Environment

  • Tool chains (cargo)
  • IDEs: code completion, suggestions
    • Partial compilations
    • Compiling incorrect code/handling partially incorrect inputs
    • Response time important, not total compilation time
    • Rapid frequent invocations
    • Don’t actually need code gen

27 of 38

Besides Correctness: Compiler Performance

  • Time to compile
    • Check-only builds
    • Incremental compilation
    • Optimizations on/off
  • Memory usage
  • Disk usage
    • Incremental compilation cache

28 of 38

Engineering: Compatibility and Stability Guarantees

  • Rust “breakage” policy: only for correctness bugs or if rare in the wild
  • crater runs
  • Compatibilities lints

Compiling playground v0.0.1 (/playground)

warning: cannot borrow `v` as mutable because it is also borrowed as immutable

--> src/main.rs:6:4

|

3 | let shared = &v;

| -- immutable borrow occurs here

...

6 | v.push(shared.len());

| ^ ------ immutable borrow later used here

| |

| mutable borrow occurs here

|

= note: `#[warn(mutable_borrow_reservation_conflict)]` on by default

= warning: this borrowing pattern was not meant to be accepted, and may become a hard error in future

= note: for more information, see issue #59159 <https://github.com/rust-lang/rust/issues/59159>

29 of 38

Engineering: Testing and Dogfooding

  • Test suite
    • Frontend “UI” tests for most analyses
      • Checks compiler message output against expected
    • Heavy-weight tests: MIR optimization, LLVM IR, debug info
      • Checks IR or binary against expected
  • Standard practices
    • Integration, regression tests
    • Run test suite in CI
    • Require tests for new functionality
  • 3 channels: Stable, Beta, Nightly
    • About 12 weeks between nightly and stable
  • Use unstable features in rustc and standard library
    • Allows usage/testing of incomplete/unstable features
    • Allows exploration of design space

30 of 38

Engineering: Bootstrapping

  • rustc and std library implemented in Rust
  • Need rustc to build rustc…

  • Use an old compiler to build a new compiler

31 of 38

Engineering: Bootstrapping

32 of 38

Engineering: Bootstrapping

YO DAWG I HEARD YOU LIKE THE RUST COMPILER

SO I PUT SOME RUST IN YOUR RUST COMPILER SO YOU CAN COMPILE THE RUST COMPILER WHEN YOU COMPILE THE RUST COMPILER

33 of 38

Engineering: Bootstrapping Challenges

  • Compiler depends on compiler AND standard library
  • Both compiler and standard library “dogfood” unstable features
    • Want fast turnaround for “dogfood”
    • Need to be able to build with previous compiler
  • Compiler/standard library binaries needs to be reproducible
  • Want to reduce development time
  • Rust ABI is unstable: can’t mix-and-match compilers and std libraries

34 of 38

Engineering: Bootstrapping

Idea: 2-stage bootstrapping

  • Build intermediate compiler with beta compiler
  • Build new compiler with intermediate

Pros

  • Dogfood in compiler/std library after 6 weeks for previously disallowed code
  • Otherwise, can dogfood immediately
  • Reach a reproducible “fixed-point” of compilers

Cons

  • Need to build compiler twice (take a long time)
  • Standard library needs #[cfg(bootstrap)] (conditional compilation)
  • Very confusing

35 of 38

Engineering: Bootstrapping

  • Can often get away with just using Beta during development
  • Use CI to do slow, full builds
  • ./x.py tool hides most complexity

36 of 38

Engineering: Developer Considerations

  • How to attract and retain open-source contributors?
  • Time to bootstrap
  • Compute/memory/disk requirements
  • Documentation
  • https://rustc-dev-guide.rust-lang.org

37 of 38

Logistics

  • Rust is open source (most contributors are volunteers, though many compiler team members are now paid)
  • Dependence on other projects: LLVM, gcc, bin-utils, cranelift
    • Bug fixes/performance regressions take a while to stabilize
    • Other potential backends: cranelift
    • gcc/gdb/lldb/etc debug info and symbol demangling
  • Documentation: https://rustc-dev-guide.rust-lang.org/

38 of 38

Other resources