DSLs: How? (an overview)
The non-Greek-speaker’s guide to making a little DSL
Part 4: Static and Dynamic Checking
DSL Implementation Stages
Tokenization
Parsing
Evaluate
AST Conversion
(input)
String
RESULT
Tokenizer
error
Parse error
static check failed
dynamic check failed
Dynamic Checks
Static Checks
List<Token>
Parse Tree
AST
ValidatedAST
This slide deck!
This slide deck!
Checking for “wrong” programs
In most languages, not all syntactically-possible programs are acceptable� - there exist ASTs representing “wrong” programs (in various senses)� - this isn’t the grammar’s fault! Grammar is for defining a syntax� - this is about a program’s meaning (semantics), not its syntax
What constitutes a “wrong” program depends on the particular language�e.g. variables used without declaration (in Python?), types assigned to inconsistently (in JavaScript?), reading without assigning (in Java?)
Defining the notions of “wrong programs” is part of the language design.
Every notion of “wrong” comes with an opposite correctness property �for a program and its executions: e.g. all variables are declared before use
What could (should?) happen if...
new x;�set x, 42;
set y, x;
assigns to undeclared variable
new x;�new y;
set y, x;
reads uninitialised variable
String x;�x = 42;
assignment of a value to a variable with inconsistent type
String x = “42”;�int y = x + 1;
uses variable as one of another type
int x = readFile(...);
int y = 1 / x;
evaluation not always possible
void m(x:Object) {� x.foo();
evaluation not always possible
method defined for x? Could x be null?
Specification: what will be checked?
Defining a language’s correctness properties is crucial for programmer understanding: what they are responsible for / what they can rely on.
It’s also important to clearly define whether or not and when these notions will be checked! e.g. will this be a potential source of runtime crashes?
These decisions can also affect the language’s implementation options:� e.g. enforced types can be used for more-efficient code generation
Clear error reporting is an important concern for all programmers:� explain which rules are violated (not just what you can’t do, but why)
These considerations apply to any software you design (not just DSLs)
Clients must know what they have done wrong, and why it's wrong
When? Static vs. Dynamic Checking
Each defined correctness property could be checked...
DSL Implementation Stages
Tokenization
Parsing
Evaluate
AST Conversion
(input)
String
RESULT
Tokenizer
error
Parse error
static check failed
dynamic check failed
Dynamic Checks
Static Checks
List<Token>
Parse Tree
AST
ValidatedAST
(during evaluation)
(before evaluation)
This slide deck!
Exercise: Correctness Properties and Checks
Correctness Property
Check Statically?
Check Dynamically?
Other Notes
Think about the following (~10 mins); we’ll gather and discuss your ideas at the end
Exercise: Correctness Properties and Checks (last yr.)
Correctness Property
Check Statically?
Hard if either array size or index expression not statically known
Yes - this is done e.g. in Java
Not except in very special cases - depends on value info.
Yes, to an extent (Rust does this, but not for all memory)
Check Dynamically?
Yes - this is done e.g. in Java
�Yes - this is done in Python + Java bytecode (some checks)
Interestingly, this can't really be done either (when to fail?)
Not really - similarly to non-termination, how can we know it won't be freed eventually?
Other Notes
�This decision (both ways!) is a key reason programmers prefer certain languages��Some programs/loops are intended to run forever
In practice, the static approach is enforcing a stricter policy/property
Think about the following (~10 mins); we’ll gather and discuss your ideas at the end
Exercise: Correctness Properties and Checks (last yrs.)
Correctness Property
Check Statically?
No - depends on value info.
No - depends on value info.��
No - depends on value info. Related to termination.
No - depends on value info.
Yes - this concerns how code is written, not what it does!
Check Dynamically?
Yes - this is done e.g. in Rust
Not precisely - could have an upper bound allowed, but would reject some OK programs
Same comment as above.�
Yes - this happens e.g. in Java
Not easily - typically code structure isn't known at runtime
Other Notes
Optional (performance)
Not done in any common language. But e.g. SPARK/ADA makes programmers formally prove termination of their code…
�In e.g. C++ there's no check
Syntactic properties might be checkable when parsing
Think about the following (~10 mins); we’ll gather and discuss your ideas at the end
Three choices for what to check, and how:
Both statically and dynamically there are three key options with respect to any correctness property:
What about when to check a particular property?...
Which could be checked statically?
new x;�set x, 42;
set y, x;
assigns to undeclared variable
new x;�new y;
set y, x;
reads uninitialised variable
String x;�x = 42;
assignment of a value to a variable with inconsistent type
String x = “42”;�int y = x + 1;
uses variable as one of another type
int x = readFile(...);
int y = 1 / x;
evaluation not always possible
void m(x:Object) {� x.foo();
evaluation not always possible
method defined for x? Could x be null?
If declarations static ✓
If declarations dynamic ?
What if initialised only on some control-flow paths?�Check approximately?
If types static ✓
If types dynamic ?
If types static ✓
If types dynamic ?
approximately?
approximately?
✓non-null static types
In general, what can be checked statically?
String x = “42”;�int y = x + 1;
uses variable as one of another type
int x = readFile(...);
int y = 1 / x;
evaluation not always possible
void m(x:Object) {� x.foo();
evaluation not always possible
If statically bound ✓
If dynamically bound ?
Properties of dynamically-bound attributes cannot typically be precisely checked statically.
One can statically approximate the necessary checks: enforce something stronger/weaker
Program Analyses: main topic �of the Part II of the course
…
set y, x;
reads uninitialised variable
If types static ✓
If types dynamic ?
approximately?
approximately?
In general, what can be checked dynamically?
String x = “42”;�int y = x + 1;
uses variable as one of another type
int x = readFile(...);
int y = 1 / x;
evaluation not always possible
void m(x:Object) {� x.foo();
evaluation not always possible
Any property can be checked dynamically, so long as the relevant attributes are represented at runtime.
To dynamically check that variables are assigned before use, we need to track this information at runtime (could cost memory/performance)
…
set y, x;
reads uninitialised variable
If we track this information?
Dynamically checkable ✓
Dynamically checkable ✓
Dynamically checkable ✓
tinyVars next features: control flow and CL args
We'll add two new features to our tinyVars language:
Basic if-then-else control flow:
Command-line (CL) arguments:
$0, $1 etc. can be used as expressions for each
(in both cases, currently crashes)
These features together mean that a single program can have multiple different executions
(i.e. when the control flow path taken depends on CL inputs)
if y {
set x, 0;
} else {
}
print x;
treats 0 as false; non-zero as true
No optional else (but you could add it!); blocks can be empty
Checking multiple control flow paths
What should our checker do with branching control flow?
We have 3 main choices:
i.e. however the program executes, declaration happened
i.e. only raise an error if we're always missing a declaration
i.e. split the checking into two at every branch in the control flow.
Not a general option for real programming languages - why?
Which seems preferable to you? Why?
if $0 {
new x;
} else {
new y;
}
…
print x
At the end of the "then" block, x is declared (but y isn't)
At the end of the "else" block, y is declared (but x isn't)
Which variables should our checker consider declared here?
Checking multiple control flow paths - pros and cons
Option 1 is safe, but rejects safe programs as well! e.g.
We have 3 main choices:
i.e. however the program executes, declaration happened
i.e. only raise an error if we're always missing a declaration
i.e. split the checking into two at every branch in the control flow.
Not a general option for real programming languages - why?
Not declared on both branches!
if $0 {
new x;
} else {
}
if $0 {
set x, 42;
} else {
}
print x
Option 2 only reports definitely wrong programs, but may miss faulty programs
Option 3 can't work with �general loops/recursion�(infinitely many paths!).
Even without: expensive �(exponentially many paths, in the number of branches)
Approximations used for Static Checking
These three design choices show up for many static checks
If: correctness property concerns dynamically-bound attributes
and: program has more than very few, simple control flow paths
then: Option 3 (checking each path) is not going to be feasible!
Instead, checker can be (overly) pessimistic or (overly) optimistic!
e.g. Java type-checking doesn't (try to) track runtime types
e.g. Intelli-J warns if you dereference a variable assigned null locally, but not if you dereference method parameters
Each way can be a valid design choice (more on this in Part II!)
Example Correctness Property: declare before assign
One potential correctness property for our tinyVars language so far:
Variables are always declared by the time they are assigned.
Right now, we’re not checking it.
Currently no checks: option.
Let’s instead implement option 1.
We can reuse the evaluation techniques from Lecture Deck 3 to implement a static check.
�That is, we do one of:
1. add a recursive AST method �2. define a new visitor (better!)
What information needs to be carried along in our traversal?
set x, 42
set y, 7
print x
assigns to undeclared variable
assigns to undeclared variable
Could crash or get unexpected behaviour: e.g. program prints 7 !
Implementation Strategy
How could we define a specific visitor for enforcing our correctness property (variables declared before assigned) ?
As usual, you need to think very carefully when you program using side-effects; consider cloning if in doubt / you prefer
�
Exercise: Implementing Static Checks (15-20 mins)
Our example program should generate an error before it runs.
(No more weird behaviour here)
Let’s extend our correctness property to rule out:
Multiple checks can be made with one more-complex visitor, or with multiple separate visitors. Start with:
set x, 42
set y, 7
print x
assigns to undeclared variable
assigns to undeclared variable
https://github.students.cs.ubc.ca/CPSC410-2023W-T2/tinyvars
Optionally: check out checker-exercises branch for this!
Illustrative Test Cases (what do you get?)
set x, 42;
set x, 42;
print x;
new x;�print x;
print x;
set x, y;
new x;�print x;
new x;�new y;�alias y, x;�set x, 42;�undef x;�print y;
if $0 {
new x;
} else {
}
if $0 {
set x, 42;
} else {
}
print x;
Can you think of your own?
Dynamic Checks �(implemented already?)
Suppose we want to check the same correctness properties dynamically; i.e. ruling out:
Recall: we can only check a property dynamically if the runtime state tracks (or is instrumented to track) the attributes relevant for the check.
For each of the three correctness properties opposite:
Recall: tinyVars runtime states
Summary: Static vs Dynamic Checking
Static Checking
Dynamic Checking
Learning Objectives re: Static and Dynamic Checking
(more-specific and complementary to LO III from the first lecture; also starting to touch on LO 4 from the Course Syllabus)��By the end of this course, you will be able to...