1 of 30

Designing and Implementing Program Analyses

Basics of how to implement them yourself, and how it’s done in general

2 of 30

Plan for Today

  1. High-level design common to all program analyses
  2. Static analysis ingredients + implementation
  3. Dynamic analysis ingredients + implementation
  4. (Extra) Meta-property analyses

3 of 30

Plan for Today

  • High-level design common to all program analyses
  • Static analysis ingredients + implementation
  • Dynamic analysis ingredients + implementation
  • (Extra) Meta-property analyses

4 of 30

Designing a Program Analysis: Important Questions

  • What is the aim?

What question is the analysis meant to try to answer? � (cf. last lecture exercise)

  • What is the setting/use-case?�What kinds of programs do you want to tackle? Are there specific language features which are more-or-less important in your setting?
  • What design trade-offs will we make?�Given the Impossible Four Properties we've just seen, we need some.�e.g. do we prefer false positive vs. false negatives in certain cases?
  • How will we handle loops/recursive control flow (for static analyses)?�A static analysis exploring all control flow paths won't terminate.�If we want termination, we'll need to compromise somewhere else.

5 of 30

Plan for Today

  • High-level design common to all program analyses
  • Static analysis ingredients + implementation
  • Dynamic analysis ingredients + implementation
  • (Extra) Meta-property analyses

6 of 30

Designing a Static Analysis : Ingredients

  • Define the goal: what is the property (of all executions) of a program?
    • What problem is the analysis supposed to help
  • Abstract states σ : type of information analysis tracks through program
    • can be of any chosen type, depending on analysis design
  • Error/output information E: type of information returned by analysis
  • Analysis function: define a function analyse(σ,s) for analysis steps
    • on pairs of abstract states σ and program statements s
    • returns a pair (σ',E) of resulting abstract state plus any errors
    • typically defined per-case of type of (supported) statement s
  • Concretisation function: maps abstract states to sets of program states or sets of program executions - this defines what abstract states mean
    • what does the analysis "think" is a possible state at this point?
  • (Optionally) termination strategy: for recursive control flow (e.g. loops)�- naïvely, an analysis will iterate around such loops forever...

7 of 30

Simple example from our tinyVars work

0. Define your program property of interest�

Static Analysis Design Ingredients

  • Abstract states σ : type of information analysis tracks through program
  • Error/output information E: type of information returned by analysis�
  • Analysis function: define a function analyse(σ,s) for analysis steps��
  • Concretisation function: maps abstract states to sets of program states��
  • (Optionally) termination strategy:

Which variables are used before declaration?

Each σ is a set of variable names (the ones the checker considers declared)

Pessimistic analysis: Concretisation of σ is the set of all program states in which the variables contained in (at least) σ are declared

analyse function is defined per statement/expression as for DSL Checkers! e.g. analyse(σ, new x; ) = ( σ⋃{x}, "")

(see Ex. Sheet 3 solutions; also more later)

More examples next lecture!

We choose E to be a string; could also choose lists of strings, other types…

8 of 30

Simple example from our tinyVars work

Aim: define your program property of interest

analyse in more detail:

Which variables are used before declaration?

analyse(σ, new x;) =

More examples next lecture!

( σ⋃{x}, "")

(in English: for declarations, we add the variable name to our state and don't produce an error)

analyse(σ, print e;) =

(in English: for print statements, we analyse the printed expression and return those results)

analyse(σ, x) = ( σ, E)

where E = if x σ: "" � otherwise: “Undeclared: ” + x + E�(in English: for uses of x, we don’t change the state but we return an error if it is not in the state)

analyse(σ, e)

9 of 30

Simple example from our tinyVars work

Aim: define your program property of interest

  • Defining an analysis in mathematical notation gives a high-level design which can be quite simply converted into concrete implementation(s):

Which variables are used before declaration?

analyse(σ, x) = ( σ, E)

where E = if x σ: "" � otherwise: “Undeclared: ” + x + E�

More examples next lecture!

10 of 30

Simple example from our tinyVars work

Sequences of statements are handled similarly for most (forward) analyses:

analyse(σ, s1; s2) = (σ2, E1 + E2) where� (σ1, E1) = analyse(σ, s1)� (σ2, E2) = analyse(σ1, s2)

new x;

set x, 42;

print x;

set y, 7;

analyse

analyse

analyse

analyse

Not all analyses follow control flow forward!

Can you think of an analysis that would be better defined as traversing backwards �(i.e. against the program’s control flow)?

({x}, "")

({x}, "")

({x}, "")

({x}, "Undeclared: y")

({}, "")

11 of 30

Handling Join-points in Program Control Flow

new z;��if ($0) {

new x;

set x, 42;

} else {

new y;

set y, 7;

}

Recall: How should analyse define an abstract state for after branches?�(TinyVars checking: saw 3 Options)

  • Union the sets (optimistic): we fail the 4th of the Impossible Four (i.e., sometimes say “yes” when should say “no”)
  • Intersect the sets (pessimistic): fail the 3rd of Impossible Four (i.e., sometimes say “no” when should say “yes”)
  • Continue 2 analyses separately:we may fail the 2nd of the Impossible Four (termination) for non-trivial programs

σ1

σ2

analyse

σ3

analyse

σ1

σ4

analyse

σ5

analyse

σ6 = ???

Recall: "Variables are always declared before they are assigned" checker

12 of 30

Handling Loops in a Static Program Analysis

…��while (x) {

var y;

set y, x-1;

print x;

set x, y;

undef y;

}

  • What abstract state(s) should be used at the start of the loop body?
  • How many times should the analysis visit the loop body?
  • Should we guarantee that the analysis itself terminates?
  • If so, how can we do it?
  • What abstract state(s) should be used to analyse code after the loop?

σ1

σ2

analyse

σ3

analyse

σ4

σ5

analyse

σ6

analyse

σ7 = ???

Loops raise many design questions!

(cf. Ex. Sheet 3, Question 5)

analyse

13 of 30

Implementing Static Analysis for a real language

input program

Your Static Analysis

Existing Framework

Results!

AST

There are many frameworks out there!

Every IDE, compiler, etc. has these needs; many are open-source.

Here’s a course-wide shared document for library/frameworks you might use - please contribute to it!� (also for languages other than Java)

Visitor API

Tokenization

Parsing

AST Conversion

Static Checks

  • Implement your analyse function over the program
  • Use an existing Visitor API; write a visitor for your checker.
    • Implementation of abstract states + error collection
  • Real-world static analysis tools often operate on e.g. compiled bytecode - fewer cases, applicable without source code (licence), reusable for multiple source languages.�But control flow, scoping etc. harder to recover at this level

14 of 30

Plan for Today

  • High-level design common to all program analyses
  • Static analysis ingredients + implementation
  • Dynamic analysis ingredients + implementation
  • (Extra) Meta-property analyses

15 of 30

Designing a Static Analysis : Ingredients

  • Define the goal: what is the property (of all executions) of a program?
    • What problem is the analysis supposed to help
  • Abstract states σ : type of information analysis tracks through program
    • can be of any chosen type, depending on analysis design
  • Error/output information E: type of information returned by analysis
  • Analysis function: define a function analyse(σ,s) for analysis steps
    • on pairs of abstract states σ and program statements s
    • returns a pair (σ',E) of resulting abstract state plus any errors
    • typically defined per-case of type of (supported) statement s
  • Concretisation function: maps abstract states to sets of program states or sets of program executions - this defines what abstract states mean
    • what does the analysis "think" is a possible state at this point?
  • (Optionally) termination strategy: for recursive control flow (e.g. loops)�- naïvely, an analysis will iterate around such loops forever...

16 of 30

Designing a Static Dynamic Analysis : Ingredients

  • Define the goal: what is the property (of all executions) of a program?
    • What problem is the analysis supposed to help
  • AbstractAnalysis states σ : type of information tracked through program
    • can be of any chosen type, depending on analysis design
  • Error/output information E: type of information returned by analysis
  • Analysis function: define a function analyse(σ,s) for analysis steps
    • on pairs of analysis states σ and program statements s
    • returns a pair (σ',E) of resulting abstract state plus any errors
    • typically defined per-case of type of (supported) statement s
  • Concretisation function: maps abstract states to sets of program states or sets of program executions - this defines what abstract states mean
    • what does the analysis "think" is a possible state at this point?
  • (Optionally) termination strategy: for recursive control flow (e.g. loops)�- naïvely, an analysis will iterate around such loops forever...

Why might we not need these?

17 of 30

Designing a Flow Sensitive Dynamic Analysis

Recall Program Analysis definition:

for this course we always mean an analysis which:

  • concerns what a program does: �analysis aim is to check statically/dynamically something about execution behaviours
  • is flow-sensitive: what the analysis does with a statement is dependent on control flow of the program (earlier statements, branches..)

What precisely does that mean?

18 of 30

Designing a Flow Sensitive Dynamic Analysis

Your dynamic analysis is (likely) flow-sensitive if:

  1. For the same program statement s (i.e., line of code), σ can be significantly different depending on the prior control flow through the program (how did we reach s?)
  2. (at least for some statement types) analyse(σ,s) does depends significantly on the contents of σ

E.g., a dynamic analysis that counts the number of total statements executed, with σ being the running total

  • technically satisfies 1. (e.g., different number of iterations through a loop)
  • does not satisfy 2. (analyse(σ,s) always adds 1 to σ)

19 of 30

How do we analyse a running program? We instrument it (change it) first!

Dynamic Program Analysis

class foo {

int x;

void bar() {

if(x > 4)

{

x = x-1;

}

else

(bytecode)

javac

java

class foo {

int x;

void bar() {

if(x > 4)

{

print(..

x = x-1;

print(..

}

javac

(bytecode)

a) instrument source code

b) instrument target code / executable

a) Instrumenting source code (e.g. grab AST and modify it)

Requires less knowledge. Necessary if analysis relates to code structure.

b) Instrumenting executable (e.g. grab bytecode and modify it)

Tool for all languages targeting same representation (e.g. jbmc on Java bytecode)

java

20 of 30

There are also two choices for how/when to perform the analysis work

What should the instrumentation do?

class foo {

int x;

void bar() {

if(x > 4)

{

x = x-1;

}

else

(bytecode)

javac

java

class foo {

int x;

void bar() {

if(x > 4)

{

print(..

x = x-1;

print(..

}

javac

(bytecode)

a) instrument source code

b) instrument target code / executable

1) Online dynamic analysis: run the analysis as part of / alongside the program

Lets us influence execution e.g. program debugger; runtime checks (cf. Lecture 6)

java

2) Offline dynamic analysis: make the program produce a log; analyse it separately

Simpler to implement; can be more efficient (but logging may still be expensive)

21 of 30

Implementing Dynamic Analysis for a real language

Results!

Usually implemented via instrumentation: adding information/code at some stage:

  1. Rewriting the source code (AST) to include e.g. appropriate extra code
  2. Rewriting the binary executable/bytecode
  3. very advanced; probably not for project!
  4. Rewriting the runtime (e.g. JVM!)

  • For instrumenting source code, use similar frameworks to grab and modify ASTs as for building static analyses
  • Your instrumentation might do all the work (online), or might log information to be processed separately (offline)

AST

Tokenization

Parsing

AST Conversion

Static Checks

Evaluate

Your Dynamic Analysis

input program

22 of 30

Breaking Down Project 2 Implementation

What is the aim?

What is the setting/use-case?�.

Write the rest of the analysis

23 of 30

Breaking Down Project 2 Implementation

What is the aim?

What is the setting/use-case?�.

Write the rest of the analysis

Questions to ask yourselves:

  • What state do we need to pass around?
  • What does the state mean?
  • At which expressions/statements do we need to update the state?
  • How do we get access to those expressions/statements?
  • Which framework elements are really necessary? How to use them?

24 of 30

Breaking Down Project 2 Implementation

What is the aim?

What is the setting/use-case?�.

Write the rest of the analysis

Approaches to answer those questions

  • Consider the simplest programs first
    • Add complexity later: but how can the results on simpler programs help (flashbacks to CPSC 110?)
  • Conduct the analysis on paper
  • Conduct the analysis “hackily”
    • E.g., manually add logging statements
  • Create simple AST passes/instrumentation to check how the framework works

25 of 30

Plan for Today

  • High-level design common to all program analyses
  • Static analysis ingredients + implementation
  • Dynamic analysis ingredients + implementation
  • (Extra) Meta-property analyses

26 of 30

Static Analysis for Meta-properties (large scale?)

Recall: analysis of SE properties other than what the code does. e.g. project-level or software process metrics

Two common kinds of property:

  1. software architecture�(e.g. class dependencies)
  2. software engineering process (e.g. team management, commits…)

How can 1. be implemented via static analysis (not a Program Analysis)? Some code-level information will be needed, typically over a lot of code!� How can we make the static analysis component efficient/reusable?

27 of 30

Direct approach : roll out the Visitor Pattern?

Let’s consider class dependencies as a fully-blown meta-property analysis

We’ll need static analysis part to get e.g.

  • usages of features from classes (especially if we want to count)
  • indirect dependencies (subclassing, inherited code, etc.)

Why not visit Java ASTs? We could!

But e.g. Eclipse’s Java ASTVisitor has 95 methods!; one per Java AST node.

Our meta-property analysis may be concerned with a tiny fragment of these.�If we want to run many high-level analyses (e.g. in IDE), this gets expensive...

28 of 30

More-Abstract Program Representations

Since these are standard problems for static / meta-property analyses, there are alternative libraries to make life easier!

For example, the javax.lang.model library provides higher-level groupings of language constructs (and visitors)

The class ElementVisitor<R,P> has only 7 visit methods! Each treats a set of language constructs uniformly.

List<? extends Element> getEnclosedElements()�provides functionality to recurse downwards generically (without knowing the concrete structure of the node being visited, or its children)

29 of 30

Frameworks for Building Program Analysers!

Some PL researchers (and companies) have gone further, and built frameworks for specifying program analyses declaratively (e.g. in Datalog)

Combines two key ideas:

  • Separate out data extraction step; assemble a reusable database of facts (e.g. subclasses, fields used)
  • Author specifies relations of interest in terms of these basic relations, via their own rules

Engines for extraction/Datalog must be generic but can be highly-optimised.�Note: idea 1 also makes sense by itself; data extraction once-and-for-all.

30 of 30

Where are we now?

SE-related Analyses

Meta-Property Analysis

Value-agnostic

Program Analysis

Static Program Analysis

Dynamic Program Analysis

Value-sensitive�(symbolic execution)

Automated (Concolic) Test-case Generation

(only briefly)

(as time permits...)

What does this code do?

Why Program Analysis is hard

Type of property analysed

When the analysis runs (and on what)

Whether values are (partially) tracked

Combines a static and a dynamic analysis