1 of 25

DSLs: How? (an overview)

The non-Greek-speaker’s guide to making a little DSL

Part 4: Static and Dynamic Checking

2 of 25

DSL Implementation Stages

Tokenization

Parsing

Evaluate

AST Conversion

(input)

String

RESULT

Tokenizer

error

Parse error

static check failed

dynamic check failed

Dynamic Checks

Static Checks

List<Token>

Parse Tree

AST

ValidatedAST

This slide deck!

This slide deck!

3 of 25

Checking for “wrong” programs

In most languages, not all syntactically-possible programs are acceptable� - there exist ASTs representing “wrong” programs (in various senses)� - this isn’t the grammar’s fault! Grammar is for defining a syntax� - this is about a program’s meaning (semantics), not its syntax

What constitutes a “wrong” program depends on the particular language�e.g. variables used without declaration (in Python?), types assigned to inconsistently (in JavaScript?), reading without assigning (in Java?)

Defining the notions of “wrong programs” is part of the language design.

Every notion of “wrong” comes with an opposite correctness propertyfor a program and its executions: e.g. all variables are declared before use

4 of 25

What could (should?) happen if...

new x;�set x, 42;

set y, x;

assigns to undeclared variable

new x;�new y;

set y, x;

reads uninitialised variable

String x;�x = 42;

assignment of a value to a variable with inconsistent type

String x = “42”;�int y = x + 1;

uses variable as one of another type

int x = readFile(...);

int y = 1 / x;

evaluation not always possible

void m(x:Object) {� x.foo();

evaluation not always possible

method defined for x? Could x be null?

5 of 25

Specification: what will be checked?

Defining a language’s correctness properties is crucial for programmer understanding: what they are responsible for / what they can rely on.

It’s also important to clearly define whether or not and when these notions will be checked! e.g. will this be a potential source of runtime crashes?

These decisions can also affect the language’s implementation options:� e.g. enforced types can be used for more-efficient code generation

Clear error reporting is an important concern for all programmers:� explain which rules are violated (not just what you can’t do, but why)

These considerations apply to any software you design (not just DSLs)

Clients must know what they have done wrong, and why it's wrong

6 of 25

When? Static vs. Dynamic Checking

Each defined correctness property could be checked...

  1. statically (“statically checked”): checks enforced at/before compile time. e.g. in Java: that variables are declared before use, their types are respected, called methods are defined for receiver’s static type
  2. dynamically (“dynamically checked”): checks are at runtime. e.g. in Java null pointer dereferences, array access out of bounds, division by zero
  3. not at all! e.g. in C++ reading uninitialised variables.�Getting the property right is up to the programmer!

7 of 25

DSL Implementation Stages

Tokenization

Parsing

Evaluate

AST Conversion

(input)

String

RESULT

Tokenizer

error

Parse error

static check failed

dynamic check failed

Dynamic Checks

Static Checks

List<Token>

Parse Tree

AST

ValidatedAST

(during evaluation)

(before evaluation)

This slide deck!

8 of 25

Exercise: Correctness Properties and Checks

Correctness Property

Check Statically?

Check Dynamically?

Other Notes

Think about the following (~10 mins); we’ll gather and discuss your ideas at the end

  1. Think of a correctness property: anything that could/should be checked about a program
  2. Could your correctness property be checked statically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it statically?
  3. Could your correctness property be checked dynamically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it dynamically?
  4. Can you think of any pros or cons to checking one way or the other?
  5. (while you have time) repeat the above steps for another correctness property!

9 of 25

Exercise: Correctness Properties and Checks (last yr.)

Correctness Property

  • No out of bounds errors�
  • Type-correctness (e.g. usage of variables)�
  • No infinite recursion�
  • Allocated memory properly freed

Check Statically?

Hard if either array size or index expression not statically known

Yes - this is done e.g. in Java

Not except in very special cases - depends on value info.

Yes, to an extent (Rust does this, but not for all memory)

Check Dynamically?

Yes - this is done e.g. in Java

�Yes - this is done in Python + Java bytecode (some checks)

Interestingly, this can't really be done either (when to fail?)

Not really - similarly to non-termination, how can we know it won't be freed eventually?

Other Notes

�This decision (both ways!) is a key reason programmers prefer certain languages�Some programs/loops are intended to run forever

In practice, the static approach is enforcing a stricter policy/property

Think about the following (~10 mins); we’ll gather and discuss your ideas at the end

  • Think of a correctness property: anything that could/should be checked about a program
  • Could your correctness property be checked statically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it statically?
  • Could your correctness property be checked dynamically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it dynamically?
  • Are there any pros or cons to checking one way or the other?
  • (while you have time) repeat the above steps for another correctness property!

10 of 25

Exercise: Correctness Properties and Checks (last yrs.)

Correctness Property

  • No integer overflows
  • No infinite loops��
  • No stack overflows�
  • No out of bounds errors
  • Follows some style guide

Check Statically?

No - depends on value info.

No - depends on value info.��

No - depends on value info. Related to termination.

No - depends on value info.

Yes - this concerns how code is written, not what it does!

Check Dynamically?

Yes - this is done e.g. in Rust

Not precisely - could have an upper bound allowed, but would reject some OK programs

Same comment as above.�

Yes - this happens e.g. in Java

Not easily - typically code structure isn't known at runtime

Other Notes

Optional (performance)

Not done in any common language. But e.g. SPARK/ADA makes programmers formally prove termination of their code…

In e.g. C++ there's no check

Syntactic properties might be checkable when parsing

Think about the following (~10 mins); we’ll gather and discuss your ideas at the end

  • Think of a correctness property: anything that could/should be checked about a program
  • Could your correctness property be checked statically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it statically?
  • Could your correctness property be checked dynamically? Any difficulties/restrictions?
    • Can you think of an actual programming language which does check it dynamically?
  • Are there any pros or cons to checking one way or the other?
  • (while you have time) repeat the above steps for another correctness property!

11 of 25

Three choices for what to check, and how:

Both statically and dynamically there are three key options with respect to any correctness property:

  • check it and halt: if the property is violated, generate a suitably clear error message. Make clear which rule has been violated (think: Consistency).
  • check it and adapt : if the property is violated, attempt to apply a change to the program to fix it. e.g. type coercions in JavaScript, boxing/unboxing in Java (e.g. Integer i = 42; int j = i; )
  • don’t check it (checked at another time or not at all)

What about when to check a particular property?...

12 of 25

Which could be checked statically?

new x;�set x, 42;

set y, x;

assigns to undeclared variable

new x;�new y;

set y, x;

reads uninitialised variable

String x;�x = 42;

assignment of a value to a variable with inconsistent type

String x = “42”;�int y = x + 1;

uses variable as one of another type

int x = readFile(...);

int y = 1 / x;

evaluation not always possible

void m(x:Object) {� x.foo();

evaluation not always possible

method defined for x? Could x be null?

If declarations static ✓

If declarations dynamic ?

What if initialised only on some control-flow paths?�Check approximately?

If types static ✓

If types dynamic ?

If types static ✓

If types dynamic ?

approximately?

approximately?

non-null static types

Java Checker Framework

13 of 25

In general, what can be checked statically?

String x = “42”;�int y = x + 1;

uses variable as one of another type

int x = readFile(...);

int y = 1 / x;

evaluation not always possible

void m(x:Object) {� x.foo();

evaluation not always possible

If statically bound ✓

If dynamically bound ?

Properties of dynamically-bound attributes cannot typically be precisely checked statically.

  • e.g. we can’t precisely check for division-by-zero errors statically because variable values are dynamically bound attributes.

One can statically approximate the necessary checks: enforce something stronger/weaker

Program Analyses: main topic �of the Part II of the course

set y, x;

reads uninitialised variable

If types static ✓

If types dynamic ?

approximately?

approximately?

14 of 25

In general, what can be checked dynamically?

String x = “42”;�int y = x + 1;

uses variable as one of another type

int x = readFile(...);

int y = 1 / x;

evaluation not always possible

void m(x:Object) {� x.foo();

evaluation not always possible

Any property can be checked dynamically, so long as the relevant attributes are represented at runtime.

  • e.g. we can dynamically check for division-by-zero as values are represented at runtime.

To dynamically check that variables are assigned before use, we need to track this information at runtime (could cost memory/performance)

set y, x;

reads uninitialised variable

If we track this information?

Dynamically checkable ✓

Dynamically checkable ✓

Dynamically checkable ✓

15 of 25

tinyVars next features: control flow and CL args

We'll add two new features to our tinyVars language:

Basic if-then-else control flow:

Command-line (CL) arguments:

$0, $1 etc. can be used as expressions for each

  • What if there are not enough?
  • What if they are not integers?

(in both cases, currently crashes)

These features together mean that a single program can have multiple different executions

(i.e. when the control flow path taken depends on CL inputs)

if y {

set x, 0;

} else {

}

print x;

treats 0 as false; non-zero as true

No optional else (but you could add it!); blocks can be empty

16 of 25

Checking multiple control flow paths

What should our checker do with branching control flow?

We have 3 main choices:

  1. Variables declared in both branches

i.e. however the program executes, declaration happened

  • Variables declared in either branch

i.e. only raise an error if we're always missing a declaration

  • Analyse each path separately

i.e. split the checking into two at every branch in the control flow.

Not a general option for real programming languages - why?

Which seems preferable to you? Why?

if $0 {

new x;

} else {

new y;

}

print x

At the end of the "then" block, x is declared (but y isn't)

At the end of the "else" block, y is declared (but x isn't)

Which variables should our checker consider declared here?

17 of 25

Checking multiple control flow paths - pros and cons

Option 1 is safe, but rejects safe programs as well! e.g.

We have 3 main choices:

  • Variables declared in both branches

i.e. however the program executes, declaration happened

  • Variables declared in either branch

i.e. only raise an error if we're always missing a declaration

  • Analyse each path separately

i.e. split the checking into two at every branch in the control flow.

Not a general option for real programming languages - why?

Not declared on both branches!

if $0 {

new x;

} else {

}

if $0 {

set x, 42;

} else {

}

print x

Option 2 only reports definitely wrong programs, but may miss faulty programs

Option 3 can't work with �general loops/recursion�(infinitely many paths!).

Even without: expensive �(exponentially many paths, in the number of branches)

18 of 25

Approximations used for Static Checking

These three design choices show up for many static checks

If: correctness property concerns dynamically-bound attributes

and: program has more than very few, simple control flow paths

then: Option 3 (checking each path) is not going to be feasible!

Instead, checker can be (overly) pessimistic or (overly) optimistic!

  1. Check a stronger property (if it passes, the original passes)

e.g. Java type-checking doesn't (try to) track runtime types

  • Check a weaker property (if it fails, then the original fails)

e.g. Intelli-J warns if you dereference a variable assigned null locally, but not if you dereference method parameters

Each way can be a valid design choice (more on this in Part II!)

19 of 25

Example Correctness Property: declare before assign

One potential correctness property for our tinyVars language so far:

Variables are always declared by the time they are assigned.

Right now, we’re not checking it.

Currently no checks: option.

Let’s instead implement option 1.

We can reuse the evaluation techniques from Lecture Deck 3 to implement a static check.

That is, we do one of:

1. add a recursive AST method �2. define a new visitor (better!)

What information needs to be carried along in our traversal?

set x, 42

set y, 7

print x

assigns to undeclared variable

assigns to undeclared variable

Could crash or get unexpected behaviour: e.g. program prints 7 !

20 of 25

Implementation Strategy

How could we define a specific visitor for enforcing our correctness property (variables declared before assigned) ?

  • e.g. could use String as return type: we will either return an empty string (representing no error), or an error message. Design choice: report first error, or report all errors? Depends on the check; decide what is most clear.
  • We could use a Set<Name> parameter to track currently-declared variables. Java’s Sets are mutable; modifying the set in a (recursive) call will persist a side-effect once we return!

As usual, you need to think very carefully when you program using side-effects; consider cloning if in doubt / you prefer

21 of 25

Exercise: Implementing Static Checks (15-20 mins)

Our example program should generate an error before it runs.

(No more weird behaviour here)

Let’s extend our correctness property to rule out:

  • assigning without declaring
  • using var without declaring
  • using var without assigning

Multiple checks can be made with one more-complex visitor, or with multiple separate visitors. Start with:

set x, 42

set y, 7

print x

assigns to undeclared variable

assigns to undeclared variable

https://github.students.cs.ubc.ca/CPSC410-2023W-T2/tinyvars

Optionally: check out checker-exercises branch for this!

  1. Think: what state will your visitor(s) need to carry around?
  2. How should these properties interact with aliasing?
  3. What approach will you take for if-then-else control flow?
  4. Implement your planned checks.
  5. Does your implementation make any (unchecked) assumptions?
  6. Can you think of corner cases?

22 of 25

Illustrative Test Cases (what do you get?)

set x, 42;

set x, 42;

print x;

new x;�print x;

print x;

set x, y;

new x;�print x;

new x;�new y;�alias y, x;�set x, 42;�undef x;�print y;

if $0 {

new x;

} else {

}

if $0 {

set x, 42;

} else {

}

print x;

Can you think of your own?

23 of 25

Dynamic Checks �(implemented already?)

Suppose we want to check the same correctness properties dynamically; i.e. ruling out:

  • assigning without declaring
  • using var without declaring
  • using var without assigning

Recall: we can only check a property dynamically if the runtime state tracks (or is instrumented to track) the attributes relevant for the check.

For each of the three correctness properties opposite:

  1. Is there enough information tracked in the runtime states to enforce the property?
  2. If so, what check(s) would you make in terms of the maps?�If not, what extra information would you need to track?
  3. Implement these checks as a part of AST evaluation
  4. Do your answers to 2. depend on any prior design decisions in tinyVars (hint: initialisation)?

Recall: tinyVars runtime states

24 of 25

Summary: Static vs Dynamic Checking

Static Checking

  • Check property is true before running the program
  • Only possible precisely for properties in terms of statically-bound attributes
  • Typically implemented via a traversal of the AST (e.g. with a suitable Visitor)
  • Can justify simpler/more efficient program evaluation (properties can be assumed)

Dynamic Checking

  • Check while running program
  • Can check more-flexible properties: attributes bound statically/dynamically (or both!)
  • Information must be present at runtime (this may require instrumentation of states)
  • Typically no way to check property for all executions of a program (just the current one)
  • May slow performance

25 of 25

Learning Objectives re: Static and Dynamic Checking

(more-specific and complementary to LO III from the first lecture; also starting to touch on LO 4 from the Course Syllabus)��By the end of this course, you will be able to...

  1. contrast the trade-offs between static and dynamic checking, identifying the consequences of each (or neither) for users and language implementers.
  2. justify whether a correctness property can be checked statically / dynamically.
  3. implement static and dynamic checkers for particular correctness properties.
  4. design test-cases to illustrate relevant corner-cases for a correctness property.