Designing and Implementing Program Analyses
Basics of how to implement them yourself, and how it’s done in general
Plan for Today
Plan for Today
Designing a Program Analysis: Important Questions
What question is the analysis meant to try to answer? � (cf. last lecture exercise)
Plan for Today
Designing a Static Analysis : Ingredients
Simple example from our tinyVars work
0. Define your program property of interest�
Static Analysis Design Ingredients
Which variables are used before declaration?
Each σ is a set of variable names (the ones the checker considers declared)
Pessimistic analysis: Concretisation of σ is the set of all program states in which the variables contained in (at least) σ are declared
analyse function is defined per statement/expression as for DSL Checkers! e.g. analyse(σ, new x; ) = ( σ⋃{x}, "")
(see Ex. Sheet 3 solutions; also more later)
More examples next lecture!
We choose E to be a string; could also choose lists of strings, other types…
Simple example from our tinyVars work
Aim: define your program property of interest
analyse in more detail:�
Which variables are used before declaration?
analyse(σ, new x;) =
More examples next lecture!
( σ⋃{x}, "")
(in English: for declarations, we add the variable name to our state and don't produce an error)
analyse(σ, print e;) =
(in English: for print statements, we analyse the printed expression and return those results)
analyse(σ, x) = ( σ, E)
where E = if x ∈ σ: "" � otherwise: “Undeclared: ” + x + E�(in English: for uses of x, we don’t change the state but we return an error if it is not in the state)
analyse(σ, e)
Simple example from our tinyVars work
Aim: define your program property of interest
Which variables are used before declaration?
analyse(σ, x) = ( σ, E)
where E = if x ∈ σ: "" � otherwise: “Undeclared: ” + x + E�
More examples next lecture!
Simple example from our tinyVars work
Sequences of statements are handled similarly for most (forward) analyses:
analyse(σ, s1; s2) = (σ2, E1 + E2) where� (σ1, E1) = analyse(σ, s1)� (σ2, E2) = analyse(σ1, s2)
new x;
set x, 42;
print x;
set y, 7;
analyse
analyse
analyse
analyse
Not all analyses follow control flow forward!
Can you think of an analysis that would be better defined as traversing backwards �(i.e. against the program’s control flow)?
({x}, "")
({x}, "")
({x}, "")
({x}, "Undeclared: y")
({}, "")
Handling Join-points in Program Control Flow
new z;��if ($0) {
new x;
set x, 42;
} else {
new y;
set y, 7;
}
Recall: How should analyse define an abstract state for after branches?�(TinyVars checking: saw 3 Options)
σ1
σ2
analyse
σ3
analyse
σ1
σ4
analyse
σ5
analyse
σ6 = ???
Recall: "Variables are always declared before they are assigned" checker
Handling Loops in a Static Program Analysis
…��while (x) {
var y;
set y, x-1;
print x;
set x, y;
undef y;
}
…
σ1
σ2
analyse
σ3
analyse
σ4
σ5
analyse
σ6
analyse
σ7 = ???
Loops raise many design questions!
(cf. Ex. Sheet 3, Question 5)
analyse
Implementing Static Analysis for a real language
input program
Your Static Analysis
Existing Framework
Results!
AST
There are many frameworks out there!
Every IDE, compiler, etc. has these needs; many are open-source.
Here’s a course-wide shared document for library/frameworks you might use - please contribute to it!� (also for languages other than Java)
Visitor API
Tokenization
Parsing
AST Conversion
Static Checks
Plan for Today
Designing a Static Analysis : Ingredients
Designing a Static Dynamic Analysis : Ingredients
Why might we not need these?
Designing a Flow Sensitive Dynamic Analysis
Recall Program Analysis definition:
for this course we always mean an analysis which:
What precisely does that mean?
Designing a Flow Sensitive Dynamic Analysis
Your dynamic analysis is (likely) flow-sensitive if:
E.g., a dynamic analysis that counts the number of total statements executed, with σ being the running total
How do we analyse a running program? We instrument it (change it) first!
Dynamic Program Analysis
class foo {
int x;
void bar() {
if(x > 4)
{
x = x-1;
}
else
(bytecode)
javac
java
class foo {
int x;
void bar() {
if(x > 4)
{
print(..
x = x-1;
print(..
}
javac
(bytecode)
a) instrument source code
b) instrument target code / executable
a) Instrumenting source code (e.g. grab AST and modify it)
Requires less knowledge. Necessary if analysis relates to code structure.
b) Instrumenting executable (e.g. grab bytecode and modify it)
Tool for all languages targeting same representation (e.g. jbmc on Java bytecode)
java
There are also two choices for how/when to perform the analysis work
What should the instrumentation do?
class foo {
int x;
void bar() {
if(x > 4)
{
x = x-1;
}
else
(bytecode)
javac
java
class foo {
int x;
void bar() {
if(x > 4)
{
print(..
x = x-1;
print(..
}
javac
(bytecode)
a) instrument source code
b) instrument target code / executable
1) Online dynamic analysis: run the analysis as part of / alongside the program
Lets us influence execution e.g. program debugger; runtime checks (cf. Lecture 6)
java
2) Offline dynamic analysis: make the program produce a log; analyse it separately
Simpler to implement; can be more efficient (but logging may still be expensive)
Implementing Dynamic Analysis for a real language
Results!
Usually implemented via instrumentation: adding information/code at some stage:
AST
Tokenization
Parsing
AST Conversion
Static Checks
Evaluate
Your Dynamic Analysis
input program
Breaking Down Project 2 Implementation
What is the aim?
What is the setting/use-case?�.
Write the rest of the analysis
Breaking Down Project 2 Implementation
What is the aim?
What is the setting/use-case?�.
Write the rest of the analysis
Questions to ask yourselves:
Breaking Down Project 2 Implementation
What is the aim?
What is the setting/use-case?�.
Write the rest of the analysis
Approaches to answer those questions
Plan for Today
Static Analysis for Meta-properties (large scale?)
Recall: analysis of SE properties other than what the code does. e.g. project-level or software process metrics
Two common kinds of property:
How can 1. be implemented via static analysis (not a Program Analysis)? Some code-level information will be needed, typically over a lot of code!� How can we make the static analysis component efficient/reusable?
Direct approach : roll out the Visitor Pattern?
Let’s consider class dependencies as a fully-blown meta-property analysis
We’ll need static analysis part to get e.g.
Why not visit Java ASTs? We could!
But e.g. Eclipse’s Java ASTVisitor has 95 methods!; one per Java AST node.
Our meta-property analysis may be concerned with a tiny fragment of these.�If we want to run many high-level analyses (e.g. in IDE), this gets expensive...
More-Abstract Program Representations
Since these are standard problems for static / meta-property analyses, there are alternative libraries to make life easier!
For example, the javax.lang.model library provides higher-level groupings of language constructs (and visitors)
The class ElementVisitor<R,P> has only 7 visit methods! Each treats a set of language constructs uniformly.
Frameworks for Building Program Analysers!
Some PL researchers (and companies) have gone further, and built frameworks for specifying program analyses declaratively (e.g. in Datalog)
Combines two key ideas:
Engines for extraction/Datalog must be generic but can be highly-optimised.�Note: idea 1 also makes sense by itself; data extraction once-and-for-all.
Where are we now?
SE-related Analyses
Meta-Property Analysis
Value-agnostic
Program Analysis
Static Program Analysis
Dynamic Program Analysis
Value-sensitive�(symbolic execution)
Automated (Concolic) Test-case Generation
(only briefly)
(as time permits...)
What does this code do?
Why Program Analysis is hard
Type of property analysed
When the analysis runs (and on what)
Whether values are (partially) tracked
Combines a static and a dynamic analysis
✓
✓
✓