Next-gen Haskell Compilation Techniques
Csaba Hruska
Overview
The GRIN compiler project & vision
My experience with GHC internals
�My approach of Haskell compilation & GHC reuse
My approach for Haskell compilation
This talk is not a GHC Dev 101
GHC has its own design, decisions made decades ago
It is important to think free about Haskell compilation
(IMO) GHC reached a local optimum many years ago�the compiler pipeline needs to be redesigned to make progress
GHC has unique design,�it makes hard to utilize the mainstream compiler technology
GHC history
Haskell origin
RAM size of the era
research questions/focus (90s, 2000s)� (pure vs IO, exceptions, concurrency, types)
technology constraints
how it shaped the compiler architecture
GHC today
RAM size today and trends
CPU trends
today's feasibility constraints
which constraints are outdated?
GHC compiler pipeline
each compilation stage (IR) focuses on a single aspect�
Aspects of PL/compiler development
Surface Language: PL research, type systems, module systems, syntax, metaprogramming
IR/optimizer/codegen: the IR type system’s purpose is to capture bugs during IR transformations (reduce miscompilation)�IR types helps compiler engineers at compiler debugging
RTS: GC, profiler, threading, exceptions, multi-core evaluator support�types can help here also: encode RTS implementation/design invariants�use metaprogramming to generate specialized code instead of manual code specialization (multi-core X profiling X calling conventions X etc)
Focus: Haskell compiler backend
In the community more developers are working on the surface language and the type system
Only few devs works on the low level code generator, optimization and RTS
Now I'll focus on the low level part of the compiler pipeline
Compiler backend concerns
Compiler technology advancements
Theory:
Practice: efficient implementation of sub-problems (libraries, DSL compilers)
What areas could be improved in GHC
The GHC developer experience can make a large impact on the future of Haskell
easier to develop → more developers → more issues solved →�better user experience → more users → more GHC developers
Improve: Compiler Developer Experience (DX)
DX: make reasoning easy, compositional design
IR semantics: interpreter
IR analysis: developer’s choice of technology
�Important: do not push GHC's technical dept to the curious mind
�My approach
Improve: compiled code quality & program efficiency
Memory management: ASAP & Micro-mitten, CIB, Perceus
Data representation: LoCal & Gibbon
Vectorisation: HRC, ISPC
Handling of laziness: HRC, GRIN
These needs different design at the low level part of the GHC compiler pipeline
Lightweight change: external STG link-time opt
GHC's design: compilation unit = Haskell module
NOTE: incremental compilation allows larger compilation units also
Minimal pipeline change: external STG IR export / analyze / transform / import�Does not need RTS and codegen change.
GHC does its main optimizations on the Core IR, but its type system does not allow to express some low level optimizations (i.e. related to data representation)
the low level optimizations could be done on external STG IR, where the compilation unit size is flexible and could consist of several modules or packages.
Lazy Evaluation
GHC's lazy evaluation: indirect jumps, STG machine
Problem: statically unknown control flow� the optimizer and code generator rely on the control flow information
GHC's solution:
Indirect jumps and the RTS/STG heap object manipulation decimates LLVM optimization capabilities.�LLVM optimizes direct calls/jumps much better.
GRIN: defunctionalization
Defunctionalization replaces indirect calls with direct calls��Good: Direct calls unleash LLVM optimization power�Bad: It needs whole program analysis (according to Urban Boquist's PhD thesis)
IMO it is possible to relax the whole program analysis constraint
Vague idea:
GRIN: lazy evaluation + defunctionalization + LLVM
Haskell:�
Note about lazy evaluation
The practical Haskell compilation is not about handling laziness�It is about supporting GHC Haskell primops.�
Static Analysis
GHC's static analysis
demand analysis:
Inter-modular via the .hi files
Souffle Datalog compiler
Soufflé is a logic programming language inspired by Datalog
https://souffle-lang.github.io/
Souffle Datalog Example
input: egde
output: reachable
calculates the reachable points of a graph (transitive closure)
Andersen style, inclusion based points-to analysis
Implemented in Souffle Datalog
Needed for GRIN (defunctionalization) to approximate the control flow
It is FAST! Analized >100 STG modules in 1.7 sec (quad-core i7 CPU)�https://github.com/grin-compiler/souffle-cfa-optimization-experiment
P4F: control flow analysis
analysis result is useful for:
Paper: Pushdown Control-Flow Analysis for Free
It was not used in production yet. How would it deal with real world Haskell programs (STG IR)?
https://github.com/csabahruska/p4f-control-flow-analysis
Memory Management
GHC RTS: memory management
Uses tracing & generational garbage collector
The garbage collector is built on top of a block layer that manages memory in units of blocks, where a block is a multiple of 4 KB in size
More info:� https://www.aosabook.org/en/ghc.html� http://ezyang.com/jfp-ghc-rts-draft.pdf
Counting Immutable Beans & Perceus
Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming�https://arxiv.org/abs/1908.05647
Perceus: Garbage Free Reference Counting with Reuse�https://www.microsoft.com/en-us/research/uploads/prod/2020/11/perceus-tr-v1.pdf
Perceus was inspired by the Counting Immutable Beans paper.�Both system uses reference counting with static analysis to find reusable memory cells.
Perceus: reuse analysis example
Perceus algorithm is used in the Koka language
The concept works for strict functional languages. (STG and GRIN is strict)
Reuse insertion in Koka:
Perceus: benchmark
ASAP & Micro-mitten
ASAP (As Static As Possible) is a compile-time automatic memory management system using whole program analysis.�https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf
Micro-mitten: ASAP implementation for a simple Rust like language�https://github.com/doctorn/micro-mitten
ASAP is not ready for production use, it needs more research.�But it fills a hole in the memory management design space.
Memory Management Design Space
ASAP
ASAP: Static Automatic Memory Management
ASAP’s static analyses can reveal the data lifetime properties at compile time. This on its own is a useful information for the programmer.
Data Representation
GHC RTS: data representation
Structs with pointers (boxed and tagged heap objects)�Every heap object is represented uniformly
GHC RTS: data representation examples
Complex data values have lots of indirections, lot of pointers!
Source & more details: http://takenobu-hs.github.io/downloads/haskell_ghc_illustrated.pdf
LoCal & Gibbon
Gibbon is an experimental compiler that transforms high-level functional programs to operate on serialized data. (without pointers/indirections)�
More info: http://iu-parfunc.github.io/gibbon/
Gibbon uses LoCal (Location Calculus). Its type system ensures transformation correctness.��LoCal allows:
LoCal & Gibbon example
LoCal & Gibbon benchmark
Paper: PLDI'19�LoCal: A Language for Programs Operating on Serialized Data:
Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, Ryan R. Newton
Gibbon2: based on LoCal
CNF: GHC’s Compact Normal Form� (Compact Region)
Cap’n Proto: very efficient C++ data� interchange format
Vectorisation
HRC: Intel Labs Haskell Research Compiler
Uses GHC as frontend, has custom optimizing backend
Vectorisation makes Haskell suitable for high performance computing
https://github.com/IntelLabs/flrc�http://www.leafpetersen.com/leaf/publications.htm
HRC RTS is not tuned for laziness, but it was not the focus
paper links
(screenshot: HRC benchmark)
(screenshot: measuring the haskell gap)
HRC + FLRC = frontend + optimizer
Functional Language Research Compiler
HRC benchmark: lazy evaluation + RTS
Paper: The Intel Labs Haskell Research Compiler. Hai Liu, Neal Glew, Leaf Petersen and Todd Anderson
HRC RTS is not tuned for laziness (it was not the focus)
Computation heavy benchmark, C vs Haskell
Vectorisation makes Haskell suitable for high performance computing
Paper: Measuring the Haskell Gap. Leaf Petersen, Todd Anderson, Hai Liu and Neal Glew
ISPC: Intel SPMD Compiler
ispc is a compiler for a variant of the C programming language,�with extensions for "single program, multiple data" (SPMD) programming.
compiles SPMD programs to CPU utilizing its SIMD instructions
massive speedup: 3x or more
needs language extension, distinct (SIMD) uniform and varying values
control flow level vectorisation (through multiple function calls)
ISPC: Intel SPMD Compiler
ISPC: Intel SPMD Compiler
Idea: SPMD for Haskell
Would it vectorise?
Runtime System & Tooling
GHC RTS implementation
Complex primops are written in C-- (i.e. exception handling).�The rest is written in C:
RTS subsystems: GC, thread scheduler, STM, I/O manager, logging��GHCi specific code: linker, byte code interpreter� (should not be in the RTS, it belongs to GHCi)��RTS depends on the base: functions, data types (i.e. exceptions)�
GHC RTS implementation
uses lots of conventions and CPP macros for RTS heap object manipulation
almost a mini OS, C structures with complex invariants
Problem:� RTS developers must maintain complex invariants manually� C type system is weak� complexity scares away developers� IMO it is impossible to make architectural changes (even for experts)
Goal: fearless refactoring
A sophisticated type system could help to check RTS invariants.�
New options for RTS implementation
Powerful system programming languages: Rust, ATS
Rust is more mainstream, but weaker type system
ATS is niche, but more powerful:
Web tools for visualization
HTML/CSS/JS is the most accessible UI technology.
Plenty of good libraries for visualization.
It has much better UX than printing to the terminal.
Web tools for visualization
HTML/CSS/JS is the most accessible UI technology.
Plenty of good libraries for visualization.
It has much better UX than printing to the terminal.
Conclusion
make experimentation easy and fun
Contact
Twitter: @csaba_hruska�Patreon: patreon.com/csaba_hruska