1 of 29

Lecture 19

Implementing Programming Languages

Programming Languages

UW CSE 341 - Autumn 2022

2 of 29

Updates

  • Homework 5 due Friday!

3 of 29

Implement-A-Language: Standard Workflow

"(fun x -> x + x) 4"

concrete syntax (string / file contents)

Type-Checking

Possible type

errors / warnings

Call

Function

+

Constant

4

x

x

x

Var

Var

abstract syntax tree (AST)

Rest of Implementation

AST that type-checks

Parsing

Possible syntax

errors / warnings

4 of 29

Workflow, In Words

  • Program starts as concrete syntax (just a string of characters)
  • Parsing turns concrete syntax into an abstract syntax tree (AST)
    • Parser rejects syntactically incorrect programs, e.g., x ) y / z
  • Type checker analyzes the AST to ensure program type checks
    • Type checker rejects type-incorrect programs, e.g., 0 + true
  • Rest of the implementation provides some way to “run” the program

5 of 29

Interpreter vs. Compiler

“Rest of implementation” takes AST and gives a way to “run the program”

Two fundamental approaches to implementing a language LBI (language being implemented)

  • Interpret: In another language M (metalanguage), write an interpreter
    • Better English words: evaluator, executor, program-runner, …
    • Takes a program in LBI and produces an answer
  • Compile: In another language M write a compiler to a third language language T (target) that has already been implemented
    • Better English words: translator, convertor, …
    • Translation must preserve meaning (equivalence!)

Crucial to keep LBI, M, T straight

6 of 29

Reality can be more complex

  • Folks mix-and-match these techniques in practice
  • Consider typical Java system (Racket is similar):
    • javac : compiles Java source to Java bytecode
      • java : interprets Java bytecode BUT
        • JVM JIT : dynamically compiles performance-critical bytecode down to assembly
          • CPU : interprets assembly BUT
            • (internally) translates (compiles) assembly to microcode instructions
              • ...

7 of 29

Sermon: Languages Aren’t Their Implementations

Interpreter versus compiler versus combinations is about a particular language implementation, not the language definition

  • Therefore, no such thing as a “compiled” or “interpreted” language
    • LBI can have multiple implementations of either sort, or a mix
  • Implementation approach is independent of semantics
    • Programs in LBI can’t tell if they’re being interpreted or compiled
  • Some folks will say “C is faster than Java because it’s compiled”
    • This is nonsense! (But don’t be a jerk and point it out rudely! :-))
    • Correct take: “C programs are often faster when compiled”
    • There are C interpreters! Very useful for debugging etc.

8 of 29

Implement-A-Language: Standard Workflow

"(fun x -> x + x) 4"

concrete syntax (string / file contents)

Parsing

Possible syntax

errors / warnings

Type-Checking

Possible type

errors / warnings

Rest of Implementation

AST that type-checks

Call

Function

+

Constant

4

x

x

x

Var

Var

abstract syntax tree (AST)

9 of 29

Skipping Parsing

  • If implementing PL LBI in PL M, we can skip parsing: just raw ASTs!
    • Embeds LBI programs as trees in M
    • Less painful than it sounds in OCaml, Racket, etc.
  • Define LBI’s abstract syntax as a variant in M

  • Example LBI program

(struct call …)

(struct function …)

(struct var …)

(call (function (list "x")

(add (var "x")(var "x")))

(const 4))

Call

Function

+

Constant

4

x

x

x

Var

Var

10 of 29

We’ve already done this for arithmetic!

M = Racket

LBI = arithmetic

  • Arithmetic programs written using Racket structs (see code)
  • Interpreter is eval-exp-return-value
    • Takes an LBI expression and produces an LBI value
  • Arithmetic (LBI) code is Racket (M) data

11 of 29

Recap of our plan

  • Avoid parsing by just using (abstract) syntax of LBI as structs
    • Previous LBI = arithmetic
    • Homework 5, LBI = MUPL (“Made Up Programming Language”)
  • Write LBI programs usings struct constructors in Racket
  • Implement interpreter for LBI as a (recursive) Racket function

12 of 29

Syntax Errors vs. Type Errors

Subtle but important distinction:

  • Our interpreters will assume input is a valid AST
    • Makes sense to assume a parser would check and ensure this
    • If this assumption is violated, it’s okay to crash / give a strange error message
    • In HW 5, do not explicitly check for such errors
  • Our interpreters will explicitly check program is dynamically type correct
    • For example, code does not try to call a boolean like a function
    • This kind of error reflects a programmer mistake “at the LBI level”
    • Give good error messages in terms of the LBI!

13 of 29

Syntax Errors

Lots of possible Racket values that make no sense in terms of LBI syntax

Okay to crash the interpreter in such cases

Do not test for these or try to handle them (that would be parser’s job)

14 of 29

Interpreter “return type”

  • We used to “return an int” for arithmetic language interpreter (e.g., 7)
    • From now on we will ALWAYS “return an expression” (e.g., (const 7))
    • A value in LBI
  • Be careful that returned expressions are really “done evaluating”, i.e., values
  • This lets us add more kinds of values to the language
    • booleans, strings, etc.
    • pairs of values, closures, ...

15 of 29

Adding Booleans

As before, there are “syntax errors” we will just ignore:

But now there is also a new kind of error possible...

16 of 29

Dynamic Type Checking

  • Syntactically correct ASTs must not crash the interpreter

  • Something else is wrong such programs: they manipulate values incorrectly
    • For example, trying to add a number and a boolean
  • We must explicitly check for these problems and report an error
    • Error message should be phrased in terms of the LBI
    • Must check everywhere a particular type of value is needed!
    • Must check after evaluation of a subexpression

17 of 29

Adding Variables

  • Our LBI examples so far have not had variables
    • No let-expressions, functions-with-arguments, etc.
  • Here we’ll describe what to do in English
    • Your job will be to translate this to code in Homework 5
  • Big change: interpreter maintains the dynamic environment!
  • We have been preparing for this moment since Day 1 of CSE 341 :D

18 of 29

Adding Variables

  • An environment is a map from variable names to LBI values
    • We will just use Racket strings for variable names
    • So an environment can be a Racket list of string, value pairs
      • Slow but will meet our needs
  • Evaluation always takes place under an environment
    • Environment is an “extra” argument to interpret an expression
    • To evaluate a variable expression: just look up its value in the environment (Day 1 fact!)
    • Most recursive calls use the same environment they were passed
    • Some cases, like let-expressions, extend the environment to evaluate the body

19 of 29

Setting up the interpreter

Whole interpreter moves into the main helper function eval-under-env:

  • Recursive calls must pass the correct environment! (Can be tricky)

Top-level eval-exp makes initial call to helper with the empty environment

Grading detail: Do not move eval-under-env inside eval-exp; leaving it at top-level helps us grade. (Otherwise would be good style.)

20 of 29

Closures

Most interesting part of Homework 5 is implementing first-class functions

  • With lexical scope of course!

Again, we have been preparing for this all quarter 😉

  • The semantics we learned for lexical scope and closures happens to lead naturally to an implementation

21 of 29

First-class Functions

“Magic”: how do we use the right environment for lexical scope?

  • Especially when functions can return functions, store them in pairs, etc. ?

No magic really: closures just “remember” by storing the right environment!

  • A closure has two parts: a function and an environment

To evaluate a function expression:

  • A function is not yet a value: it evaluates to a closure (which is a value)
  • Create a closure out of (1) the function and (2) the current environment

22 of 29

Calling Functions (call e1 e2)

  • Use current environment to evaluate e1 to a closure
    • Error if it does not evaluate to a closure! (dynamic type checking)
  • Use current environment to evaluate e2 to a value
  • Evaluate closure’s function in new environment
    • Everything from the closure’s environment PLUS
    • Mapping of function’s argument name to value from e2
    • Mapping of function’s name to the whole closure (for recursion!)
  • This is the semantics we learned, just in code now!
  • A closure’s function body is always evaluated in the closure’s (extended) env, NEVER the caller’s environment!

23 of 29

The Cost of Closures

  • Since a closure “holds on to” the environment where it was defined, we might waste space if most of the bindings are not used
    • Then again, environments are immutable, so we get lots of natural aliasing/sharing
  • Alternative used in practice: When creating a closure, store a smaller environment containing only the variables the function body might actually use
    • Keep only the bindings for the free variables of the function
    • Free variables: variables in body (after accounting for shadowing)
    • A function body cannot possibly use any other variables

  • We’ll explore this more in a challenge problem

24 of 29

Free Variables Example

25 of 29

Free Variables Example

26 of 29

Computing Free Variables

  • So does the interpreter have to compute a function body’s free variables every time it creates a closure?
  • No: A function’s free variables is a static property, so we can (pre)compute the set for each function before we start evaluation
    • Store the set of variables with the function
  • See challenge problem

27 of 29

Optional: compiling closures

Have now explained how an interpreter implements closures, but what about a compiler to a target language like C/assembly that does not have closures?

  1. Create closures the same way
  2. Implement functions by having every function take an extra argument, the environment to use for evaluating the body, and function calls to pass in the right environment
    • Essentially weaves all the environment passing into the compiled code because the target language’s interpreter cannot do it for us
  3. Implement free-variable lookup by using function’s extra argument

This compilation technique is called closure conversion. It’s cool. :)

28 of 29

Implementing LBI Macros

Remember how we are implementing languages

  • Implement PL LBI in PL M
  • Skip parsing by writing LBI ASTs directly as data-structures in M
  • Interpreter is just a recursive M function

Remember what we know about macros

  • Extend the syntax of a language
  • Macros get expanded before evaluation begins

29 of 29

Implementing LBI Macros

With this set up, we can use PL M functions that return LBI ASTs as “LBI macros”

  • When writing LBI programs, use “macros” as if they were LBI syntax
  • No change to the interpreter!
  • Helps emphasize that macros manipulate syntax

See code for examples.

Downside:

  • They don’t get hygiene right (issues related to macro/LBI-variable shadowing)