1 of 29

Lecture 19

Implementing Programming Languages

Programming Languages

UW CSE 341 - Autumn 2022

2 of 29

Updates

Homework 5 due Friday!

3 of 29

Implement-A-Language: Standard Workflow

"(fun x -> x + x) 4"

concrete syntax (string / file contents)

Type-Checking

Possible type

errors / warnings

Call

Function

Constant

Var

abstract syntax tree (AST)

Rest of Implementation

AST that type-checks

Parsing

Possible syntax

errors / warnings

4 of 29

Workflow, In Words

Program starts as concrete syntax (just a string of characters)
Parsing turns concrete syntax into an abstract syntax tree (AST)

Parser rejects syntactically incorrect programs, e.g., x ) y / z

Type checker analyzes the AST to ensure program type checks

Type checker rejects type-incorrect programs, e.g., 0 + true

Rest of the implementation provides some way to “run” the program

5 of 29

Interpreter vs. Compiler

“Rest of implementation” takes AST and gives a way to “run the program”

Two fundamental approaches to implementing a language LBI (language being implemented)

Interpret: In another language M (metalanguage), write an interpreter

Better English words: evaluator, executor, program-runner, …
Takes a program in LBI and produces an answer

Compile: In another language M write a compiler to a third language language T (target) that has already been implemented

Better English words: translator, convertor, …
Translation must preserve meaning (equivalence!)

Crucial to keep LBI, M, T straight

6 of 29

Reality can be more complex

Folks mix-and-match these techniques in practice
Consider typical Java system (Racket is similar):

javac : compiles Java source to Java bytecode

java : interprets Java bytecode BUT

JVM JIT : dynamically compiles performance-critical bytecode down to assembly

CPU : interprets assembly BUT

(internally) translates (compiles) assembly to microcode instructions

7 of 29

Sermon: Languages Aren’t Their Implementations

Interpreter versus compiler versus combinations is about a particular language implementation, not the language definition

Therefore, no such thing as a “compiled” or “interpreted” language

LBI can have multiple implementations of either sort, or a mix

Implementation approach is independent of semantics

Programs in LBI can’t tell if they’re being interpreted or compiled

Some folks will say “C is faster than Java because it’s compiled”

This is nonsense! (But don’t be a jerk and point it out rudely! :-))
Correct take: “C programs are often faster when compiled”
There are C interpreters! Very useful for debugging etc.

8 of 29

Implement-A-Language: Standard Workflow

"(fun x -> x + x) 4"

concrete syntax (string / file contents)

Parsing

Possible syntax

errors / warnings

Type-Checking

Possible type

errors / warnings

Rest of Implementation

AST that type-checks

Call

Function

Constant

Var

abstract syntax tree (AST)

9 of 29

Skipping Parsing

If implementing PL LBI in PL M, we can skip parsing: just raw ASTs!

Embeds LBI programs as trees in M
Less painful than it sounds in OCaml, Racket, etc.

Define LBI’s abstract syntax as a variant in M

Example LBI program

(struct call …)

(struct function …)

(struct var …)

…

(call (function (list "x")

(add (var "x")(var "x")))

(const 4))

Call

Function

Constant

Var

10 of 29

We’ve already done this for arithmetic!

M = Racket

LBI = arithmetic

Arithmetic programs written using Racket structs (see code)
Interpreter is eval-exp-return-value

Takes an LBI expression and produces an LBI value

Arithmetic (LBI) code is Racket (M) data

11 of 29

Recap of our plan

Avoid parsing by just using (abstract) syntax of LBI as structs

Previous LBI = arithmetic
Homework 5, LBI = MUPL (“Made Up Programming Language”)

Write LBI programs usings struct constructors in Racket
Implement interpreter for LBI as a (recursive) Racket function

12 of 29

Syntax Errors vs. Type Errors

Subtle but important distinction:

Our interpreters will assume input is a valid AST

Makes sense to assume a parser would check and ensure this
If this assumption is violated, it’s okay to crash / give a strange error message
In HW 5, do not explicitly check for such errors

Our interpreters will explicitly check program is dynamically type correct

For example, code does not try to call a boolean like a function
This kind of error reflects a programmer mistake “at the LBI level”
Give good error messages in terms of the LBI!

13 of 29

Syntax Errors

Lots of possible Racket values that make no sense in terms of LBI syntax

Okay to crash the interpreter in such cases

Do not test for these or try to handle them (that would be parser’s job)

14 of 29

Interpreter “return type”

We used to “return an int” for arithmetic language interpreter (e.g., 7)

From now on we will ALWAYS “return an expression” (e.g., (const 7))
A value in LBI

Be careful that returned expressions are really “done evaluating”, i.e., values
This lets us add more kinds of values to the language

booleans, strings, etc.
pairs of values, closures, ...

15 of 29

Adding Booleans

As before, there are “syntax errors” we will just ignore:

But now there is also a new kind of error possible...

16 of 29

Dynamic Type Checking

Syntactically correct ASTs must not crash the interpreter

Something else is wrong such programs: they manipulate values incorrectly

For example, trying to add a number and a boolean

We must explicitly check for these problems and report an error

Error message should be phrased in terms of the LBI
Must check everywhere a particular type of value is needed!
Must check after evaluation of a subexpression

17 of 29

Adding Variables

Our LBI examples so far have not had variables

No let-expressions, functions-with-arguments, etc.

Here we’ll describe what to do in English

Your job will be to translate this to code in Homework 5

Big change: interpreter maintains the dynamic environment!
We have been preparing for this moment since Day 1 of CSE 341 :D

18 of 29

Adding Variables

An environment is a map from variable names to LBI values

We will just use Racket strings for variable names
So an environment can be a Racket list of string, value pairs

Slow but will meet our needs

Evaluation always takes place under an environment

Environment is an “extra” argument to interpret an expression
To evaluate a variable expression: just look up its value in the environment (Day 1 fact!)
Most recursive calls use the same environment they were passed
Some cases, like let-expressions, extend the environment to evaluate the body

19 of 29

Setting up the interpreter

Whole interpreter moves into the main helper function eval-under-env:

Recursive calls must pass the correct environment! (Can be tricky)

Top-level eval-exp makes initial call to helper with the empty environment

Grading detail: Do not move eval-under-env inside eval-exp; leaving it at top-level helps us grade. (Otherwise would be good style.)

20 of 29

Closures

Most interesting part of Homework 5 is implementing first-class functions

With lexical scope of course!

Again, we have been preparing for this all quarter 😉

The semantics we learned for lexical scope and closures happens to lead naturally to an implementation

21 of 29

First-class Functions

“Magic”: how do we use the right environment for lexical scope?

Especially when functions can return functions, store them in pairs, etc. ?

No magic really: closures just “remember” by storing the right environment!

A closure has two parts: a function and an environment

To evaluate a function expression:

A function is not yet a value: it evaluates to a closure (which is a value)
Create a closure out of (1) the function and (2) the current environment

22 of 29

Calling Functions (call e1 e2)

Use current environment to evaluate e1 to a closure

Error if it does not evaluate to a closure! (dynamic type checking)

Use current environment to evaluate e2 to a value
Evaluate closure’s function in new environment

Everything from the closure’s environment PLUS
Mapping of function’s argument name to value from e2
Mapping of function’s name to the whole closure (for recursion!)

This is the semantics we learned, just in code now!
A closure’s function body is always evaluated in the closure’s (extended) env, NEVER the caller’s environment!

23 of 29

The Cost of Closures

Since a closure “holds on to” the environment where it was defined, we might waste space if most of the bindings are not used

Then again, environments are immutable, so we get lots of natural aliasing/sharing

Alternative used in practice: When creating a closure, store a smaller environment containing only the variables the function body might actually use

Keep only the bindings for the free variables of the function
Free variables: variables in body (after accounting for shadowing)
A function body cannot possibly use any other variables

We’ll explore this more in a challenge problem

24 of 29

Free Variables Example

25 of 29

Free Variables Example

26 of 29

Computing Free Variables

So does the interpreter have to compute a function body’s free variables every time it creates a closure?
No: A function’s free variables is a static property, so we can (pre)compute the set for each function before we start evaluation

Store the set of variables with the function

See challenge problem

27 of 29

Optional: compiling closures

Have now explained how an interpreter implements closures, but what about a compiler to a target language like C/assembly that does not have closures?

Create closures the same way
Implement functions by having every function take an extra argument, the environment to use for evaluating the body, and function calls to pass in the right environment

Essentially weaves all the environment passing into the compiled code because the target language’s interpreter cannot do it for us

Implement free-variable lookup by using function’s extra argument

This compilation technique is called closure conversion. It’s cool. :)

28 of 29

Implementing LBI Macros

Remember how we are implementing languages

Implement PL LBI in PL M
Skip parsing by writing LBI ASTs directly as data-structures in M
Interpreter is just a recursive M function

Remember what we know about macros

Extend the syntax of a language
Macros get expanded before evaluation begins

29 of 29

Implementing LBI Macros

With this set up, we can use PL M functions that return LBI ASTs as “LBI macros”

When writing LBI programs, use “macros” as if they were LBI syntax
No change to the interpreter!
Helps emphasize that macros manipulate syntax

See code for examples.

Downside:

They don’t get hygiene right (issues related to macro/LBI-variable shadowing)