Introduction to Part II:
Program Analysis for Software Engineering
Extracting information useful for programming;
presenting it usefully to programmers
Project 2 Overview: Program Analysis Project
Program Analysis Project: in more detail...
Discussion: what we’re looking for for Project 2
Part II: Analysis Tools for Software Engineering (SE)
SE-related Analyses
Meta-Property Analysis
Value-agnostic
Program Analysis
Static Program Analysis
Dynamic Program Analysis
Value-sensitive�(symbolic execution)
Automated (Concolic) Test-case Generation
(only briefly)
(as time permits...)
What does this code do?
Why Program Analysis is hard
Type of property analysed
When the analysis runs (and on what)
Whether values are (partially) tracked
Combines a static and a dynamic analysis
Fuzzing
(if time permits further…)
Approximations to work in practice
By the end of this (half of the) course, you will be able to...
Medium-Level Learning Objectives for Part II
Part II: Analysis Tools for Software Engineering (SE)
Main focus for Part II: widely-used varieties of SE-related analysis tools
Two top-level categories of SE-related analyses that we’ll discuss:
Analysing what a (part of a) program will do / does / can do
Can this code ever crash? Will it terminate? What results does t this function compute? Which loops have high runtime? etc…�
Basically: anything else except for what the code actually does� Which classes are coupled? Which developers contributed?
How is the code written? - coding style checkers, etc...
Example Meta-property Analysis
Analysing software engineering concerns other than what programs do
e.g. presenting histories in�distributed version control:�
Note: the colour-coded�visualisation of results!
Also Software Architecture properties (e.g. module dependencies, coupling)
Example Program Analyses: A Static Analysis Tool
Facebook’s Infer
Example Program Analyses: A Static Analysis Tool
Facebook’s Infer
Example Program Analyses: A Static Analysis Tool
Facebook’s Infer
Demo
Example Program Analyses: A Dynamic Analysis Tool
Example Program Analyses: A Dynamic Analysis Tool
py-spy: Sampling profiler for Python programs
Example Program Analyses: A Dynamic Analysis Tool
py-spy: Sampling profiler for Python programs
Demo
When the Analysis Runs: Static Analysis
Static: without running the program, cf. DSL Validation (Slide Deck 5)
Deals with structured source code (typically ASTs) without running it.
Statically analysing potential runtime behaviours (Static Program Analysis) requires considering all executions of the code of interest
Requires context about the execution up to this point: not just this line
But: tracking this statically and precisely for all executions isn't possible!
Is this call guaranteed not to cause (null pointer) exceptions?
When the Analysis Runs: Dynamic Analysis
Analyse program while running it.
e.g. simple runtime checks �(e.g. array bounds) as in lecture 9.
or more-sophisticated examples: e.g. program debugger (watches for breakpoints, instruments runtime, records information)
Q. How many distinct visualisations are in use here?
"Program Analysis" in CPSC 410
There are various definitions of Program Analysis; for this course we always mean an analysis which:
e.g. type-checking in Java is not a Program Analysis: what to check per statement is independent of control flow (only declared types)
But type-inference in a dynamically-typed language (e.g. Python) is in-general flow-sensitive:
if b:
x = 42
x = x + 7
else:
x = "foo"
x = x + "bar"
if b:
x = 42
x = x + 7
else:
x = "foo"
x = x + 7
TypeError: can only concatenate str (not "int") to str
In-class activity (3 phases: individual, group, class)
Suppose you’re given a software project, and need to take over and upgrade it. What questions about the existing code (any!) might you ask / want to get answers to?
Phase 1 (5 minutes): Brainstorm by yourself: list the questions/tools you can think of. Which do you find most useful/important in practice? Do you choose not to use some?
Phase 2 (10 minutes): find a small (3-5 people) discussion group. Start off by briefly introducing yourselves to each other! Then merge your lists and compare notes on usefulness / categorisations.
Phase 3 (10-15 minutes): We'll assemble and merge ideas from all groups.
Brainstorming Question Results
Suppose you’re given a software project, and need to take over and upgrade it. What questions about the existing code might you ask / want to get answers to? Ideas in-class + previous years:
1. How the code is written (Meta-Properties)
2. What the code does (Program Analysis!)
3. What the author intended
0. Others
Brainstorming Question Results
Suppose you’re given a software project, and need to take over and upgrade it. What questions about the existing code might you ask / want to get answers to? Ideas last class + previous years:
1. How the code is written (Meta-Properties)
2. What the code does (Program Analysis!)
3. What the author intended
0. Others
Note: 3. is not available from the code - requires author documentation
Three main kinds of property about code
A. e.g. identities of all exceptions declared thrown, compare this with all checked exceptions potentially thrown by any statement in the body.
A little more on the easy case (1)
Virtually every property of interest concerning code behaviour is an
all-executions property: in terms of the set of all possible executions (runs).
Can the method throw exceptions? Is there any execution which does?
Does the method always terminate? Are all executions terminating?
What does this loop compute in general (not just on one specific input).
By contrast, a single-execution property regards one run of the code: e.g. what does this loop compute given one specific input state? (a test case)
For the rest of the course: the hard case (2)
static void bar(int[] a, int[] b, int i, int j, int n) {
int k = 0;
while(k != n) {
a[i + k] = b[j + k]; k++;
}
}
How many executions are we talking about?
How many different possible executions does this code have?
Infinitely many! �(think: all possible input states, then run)
Can the code raise exceptions?
Under which conditions?
NullPointerException �ArrayOutOfBounds...
Does the code always terminate? In a sense, yes - if e.g. n were negative, we’d eventually go out of bounds
What does this code do? In general?
For very simple code, working this out by hand is doable - by human intuition/understanding you can see the general case(s) (but usually miss some - e.g. what if a and b are the same?) �But could you automate this analysis?
static void bar(int[] a, int[] b, int i, int j, int n) {
int k = 0;
while(k != n) {
a[i + k] = b[j + k]; k++;
}
}
An aside: bugs which aren’t bugs
Suppose you report to the developer of this code that it can throw exceptions….
Most-likely you’ll get a response saying “Of course - you’re not meant to call it like that!”.
The method has an implicit precondition which callers are supposed to satisfy, and either:
Precondition information is relevant for many program analyses; knowing assumptions to start from / which “executions” are not worth analysing; checking callers satisfy them.
Allows for a better modular analysis of the program (e.g. per method definition)
Tougher Example
What does this code do?
Can it cause exceptions? When?
Does it always terminate?
int foo(int[] a, int[] b, int i, int j) {
int k = j;
if (i < a.length / 2) {
k = foo(a, b, i*2+1, k);
}
a[i] = b[k];
k--;
if (i < a.length / 2) {
k = foo(a, b, i*2, k);
}
return k;
}
This is already very difficult by hand.
Partly: what the author was thinking is missing, and needs recovering.
And, compared with real-world code, this is still very short and simple!
Virtually every property of interest concerning code behaviour is an
all-executions property: in terms of the set of all possible executions (runs).
...and any interesting piece of code has infinitely many possible executions
Why is Program Analysis so hard?
Rice’s Theorem: "Every non-trivial semantic property of programs is undecidable" - semantic refers to what programs do (Program Analysis).
In other words, all Program Analysis problems are undecidable. But tons of program analysis tools exist... what does undecidable mean, in practice?
The Impossible Four Properties�For a “yes”/“no” Program Analysis question, no Program Analysis can exist which, achieves all of the following for all input programs:
(...and analogously for properties more-complex than “yes”/“no” ones)
Why Program Analysis is so hard, formally
For a “yes”/“no” Program Analysis question, no Program Analysis can exist which, achieves all of the following for all input programs:
The Impossible Four Criteria: Points to Note (I)
For a “yes”/“no” Program Analysis question, no Program Analysis can exist which, achieves all of the following for all input programs:
The Impossible Four Criteria: Points to Note (II)
Software Engineering Analyses can examine any property of code, projects, programming process. For this course:
Summary