1 of 47

Programming Languages/Software Engineering

Construction of Call Graphs

Credits: Slide content contains repurposed material originally by Atanas Rountev.

2 of 47

David Grove and Craig Chambers. 2001. A framework for call graph construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 (November 2001), 685-746.

3 of 47

Call Graphs

Widely-used representation of calling relationships

Key component of interprocedural control-flow analysis
First step toward interprocedural dataflow analysis

Why?

Credits: Slide content contains repurposed material originally by Atanas Rountev.

4 of 47

Map of what is coming next

Call graph construction for C
Call graph construction for object-oriented languages (focus on Java)

Class Hierarchy Analysis
Rapid Type Analysis

Credits: Slide content contains repurposed material originally by Atanas Rountev.

5 of 47

Call Graph Construction for C

Problem: function pointers�Examples from “Precise Call Graphs for C Programs with Function Pointers”, Ana Milanova, Atanas Rountev, and Barbara G. Ryder, International Journal of Automated Software Engineering (JASE), 2004

Credits: Slide content contains repurposed material originally by Atanas Rountev.

7 of 47

Another Example

What do we do with these function pointers?
Simple answer: any function whose address is taken could possibly be called

Can try to restrict only to functions that “match” the types at the call site; be careful, e.g., void *xmalloc(size_t)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

8 of 47

Precise Resolution of Function Pointers

Need interprocedural points-to analysis

Need a call graph! (flow of pointer values through parameter passing and procedure return values)

Simple solution

Conservative call graph based on address-taken
Do points-to analysis
Re-compute the call graph using points-to information

Or, call graph construction during points-to analysis

Start without any knowledge of f.p. Calls
When a f.p. “shows up” in Pt(fp) at a call (*fp)(…), resolve it and update the points-to solution
Theoretically more precise; hard to design/implement

Credits: Slide content contains repurposed material originally by Atanas Rountev.

9 of 47

Call Graph Construction for C (cont’d)

Problems come not only from function pointers ...
Library calls: typically, the pre-compiled libraries are not analyzed

Standard libraries
Third-party libraries

A library call can trigger a callback to the program

E.g. in stdlib.h: void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))

setjmp and longjmp

setjmp(jmp_buf env): stores the registers in env, including the stack pointer and the program counter
longjmp(env): restores the registers; execution continues after the setjump program point

Credits: Slide content contains repurposed material originally by Atanas Rountev.

10 of 47

Methods Calls (Invocations in Java)

x.m(a,b): method invocation at compile time

A target method is associated with the call
“compile-time target”, “static target”
Based on the declared type of variable x

��x has declared type A: compile-time target is A.m�javac encodes this in the bytecode (foo.class)�virtualinvoke x,<A: void m(int,int)>

Credits: Slide content contains repurposed material originally by Atanas Rountev.

11 of 47

Methods Calls (Invocations in Java)

virtualinvoke x,<A: void m(int,int)> inside the JVM

Look at the class Z of the object that x refers to at that particular moment
Search Z for a method with signature m(int,int) and return type void

If Z doesn’t have it, go to Z’s superclass, and so on upwards, until a match is found
Invoke the method on the object that is pointed-to by x

Run-time (dynamic) target: “lowest” method that matches the signature and the return type of the static target (“lowest” w.r.t. inheritance chain from Z to java.lang.Object)

This process is called virtual dispatch or method lookup

Credits: Slide content contains repurposed material originally by Atanas Rountev.

12 of 47

Call Graphs for Software Understanding

Tools for software understanding

“smart” development environments (e.g., Eclipse, NetBeans, IntelliJ, PyCharm, Cloud9, Xcode), maintenance tools, visualization tools, etc.

Credits: Slide content contains repurposed material originally by Atanas Rountev.

13 of 47

Call Graphs for Optimizations

Resolution of virtual calls

e.g. “virtualinvoke” in Java bytecode��

If the call has only one outgoing edge in the call graph, the virtual dispatch at run time will always produce the same target

So, before the program is even executed, we can replace the virtual call with a “normal” call
Or, alternatively, after the program is loaded in the JVM, do run-time analysis and optimizations
This can be part of the JIT

Credits: Slide content contains repurposed material originally by Atanas Rountev.

14 of 47

Resolution of Virtual Calls

Probably the oldest optimization problem for object-oriented languages

Smalltalk, C++, Java, many research languages
Goal 1: remove run-time virtual dispatch
Goal 2: inlining – insert the body of the called method in the caller (big performance win)��

Do this at compile time or at run time

Credits: Slide content contains repurposed material originally by Atanas Rountev.

15 of 47

The World of Call Graph Construction [Grove & Chambers 2001]

16 of 47

Class Hierarchy Analysis (CHA)

The simplest method for call graph construction

At the bottom of the previous slide

Start from main and perform reachability

The only tricky part: virtual calls

Helper function used in CHA: dispatch

Simulates the effects of the run-time virtual dispatch (a.k.a. method lookup)

Note: even CHA gets tricky in the presence of dynamic class loading, reflection, native methods, etc.

“Assumption Hierarchy for a CHA Call Graph Construction Algorithm”, Jason Sawin and Atanas Rountev, IEEE Int. Working Conference on Source Code Analysis and Manipulation, 2011

Credits: Slide content contains repurposed material originally by Atanas Rountev.

17 of 47

Class Hierarchy Analysis (CHA)

Many other usages.

Type inference:

Raffi Khatchadourian. Automated refactoring of legacy Java software to enumerated types. Automated Software Engineering, 24(4):757–787, December 2017.

Code “fact” generation, i.e., relationships between program entities including method calling:

M. R. Robillard and G. C. Murphy, "Concern graphs: finding and describing concerns using structural program dependencies," Proceedings of the 24th International Conference on Software Engineering. ICSE 2002, Orlando, FL, USA, 2002, pp. 406-416.
Raffi Khatchadourian, Awais Rashid, Hidehiko Masuhara, and Takuya Watanabe. Detecting broken pointcuts using structural commonality and degree of interest. Science of Computer Programming, 150:56–74, December 2017.

Credits: Slide content contains repurposed material originally by Atanas Rountev.

18 of 47

dispatch

dispatch(call_site s, receiver_class rc)

sig = signature_of_static_target(s)� ret = return_type_of_static_target(s)� c = rc;� while (c != null)� if class c contains a method m with � signature sig and return type ret� return m� c = superclass(c)� print “ERROR: this should be unreachable”

Why?

Credits: Slide content contains repurposed material originally by Atanas Rountev.

19 of 47

One Possible Implementation of CHA

Queue worklist
CallGraph Graph
worklist.addAtTail(main);
Graph.addNode(main)
while (worklist.notEmpty())
m = worklist.getFromHead();
process_method_body(m);

Credits: Slide content contains repurposed material originally by Atanas Rountev.

20 of 47

process_method_body(method m)

for each call site s inside m� if s is a static call or a constructor call or a call through super� add_edge(s)� if s is a virtual call v.n(…)� rcv_class = type_of(v);� for each non-abstract class c that is a subclass of � rcv_class or rcv_class itself� x = dispatch(s,c)� add_edge(s,x)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

21 of 47

add_edge

add_edge(call_site s)� // for static calls, constructor calls, and calls through super� m = target(s);� if m is not in Graph� Graph.addNode(m);� worklist.addAtTail(m);� Graph.addEdge(s,m)

add_edge(call_site s, run_time_target x)� // same here

Credits: Slide content contains repurposed material originally by Atanas Rountev.

24 of 47

Example

State after processing c1

worklist = {B.m,C.m}
Graph.Nodes = {A.main, B.m, C.m}
Graph.Edges = { (c1,B.m), (c1,C.m) }

Edge (c1,C.m) is spurious (infeasible)

There is no execution of the program in which c1 invokes C.m

More precise analyses produce fewer spurious edges

Typically are more expensive (time/memory)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

26 of 47

Example

State after processing c2

worklist = {B.m,C.m,A.m}
Graph.Nodes = {A.main, B.m, C.m, A.m}
Graph.Edges = {(c1,B.m),(c1,C.m), (c2,A.m),(c2,B.m),(c2,C.m) }

Edges (c2,A.m) and (c2,C.m) are spurious
After we are done with A.main, take the next method at the head of the queue

in this case B.m

Credits: Slide content contains repurposed material originally by Atanas Rountev.

28 of 47

Example

State after processing c3

worklist = {C.m,A.m,A.n,C.n}
Graph.Nodes = {A.main, B.m, C.m, A.m, C.n}
Graph.Edges = {(c1,B.m), (c1,C.m), (c2,A.m), (c2,B.m), (c2,C.m), (c3,A.n), (c3,C.n) }

Edge (c3,C.n) is spurious
The rest of the methods in the queue have empty bodies, so the rest of the algorithm doesn’t create any new edges/nodes

Credits: Slide content contains repurposed material originally by Atanas Rountev.

29 of 47

The Resulting Graph

Credits: Slide content contains repurposed material originally by Atanas Rountev.

30 of 47

Rapid Type Analysis

An analysis that is the next step after CHA

Guaranteed to produce a call graph that is a subset of the call graph produced by CHA
Still quite imprecise. There are many analyses that are better than RTA

“type analysis”

Idea: given a reference/pointer variable, try to figure out what types of objects this variable may refer/point to

Credits: Slide content contains repurposed material originally by Atanas Rountev.

31 of 47

Rapid Type Analysis

Basic insight: some classes are never instantiated in reachable methods

i.e. there is never a new X() expression

Main reason: programs that are built on top of libraries

Large parts of the library code are unused

When we try to figure out the possible run-time targets of a virtual call, we can safely ignore classes that are not instantiated

Credits: Slide content contains repurposed material originally by Atanas Rountev.

32 of 47

One Possible Implementation of RTA

Queue worklist
CallGraph Graph
worklist.addAtTail(main);
Set instantiated_classes
Map pending_call_sites
Graph.addNode(main)
while (worklist.notEmpty())
m = worklist.getFromHead();
process_method_body(m);

Credits: Slide content contains repurposed material originally by Atanas Rountev.

33 of 47

process_method_body(method m)

for each expression new X inside m� if (X ∉ instantiated_classes)� add X to instantiated_classes� resolve_pending(X)�for each call site s inside m� if s is a static call or a constructor call or a call through super� add_edge(s)� if s is a virtual call v.n(…)� rcv_class = type_of(v);� for each non-abstract class c that is a � subclass of rcv_class or rcv_class itself� process_rcv_class(c,s)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

34 of 47

process_rcv_class

process_rcv_class(class c, call_site s)� x = dispatch(s,c)� if c ∈ instantiated_classes� add_edge(s,x)� else // c is not currently instantiated, � // but in the future it may be, so� // we have to remember this edge� remember (s,x) in pending(c)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

35 of 47

resolve_pending(class c)

// class c became instantiated, and�// we need to add all pending edges�for each (s,x) in pending(c)� add_edge(s,x)

Called by process_method_body :�for each expression new X� if (X ∉ instantiated_classes)� add X to instantiated_classes� resolve_pending(X)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

38 of 47

Example

process_rcv_class(c1,B)

Since B is instantiated, add edge (c1,B.m)

process_rcv_class(c1,C)

Since C is not instantiated, we do not add edge (c1,C.m) to the call graph

Remember (c1,C.m) in pending(C)
State after processing c1

worklist = {B.m}
Graph.Nodes = {A.main, B.m}
Graph.Edges = { (c1,B.m)}

Credits: Slide content contains repurposed material originally by Atanas Rountev.

40 of 47

Example

(c2,A): add (c2,A.m) to pending(A)
(c2,B): add (c2,B.m) to Graph
(c2,C): add (c2,C.m) to pending(C)
State after processing c2

worklist = {B.m}
Graph.Nodes = {A.main, B.m}
Graph.Edges = {(c1,B.m), (c2,B.m)}
pending(A) = {(c2,A.m)}
pending(C) = {(c1,C.m),(c2,C.m)}

Credits: Slide content contains repurposed material originally by Atanas Rountev.

42 of 47

Example

resolve_pending(A)

Graph.Nodes = {A.main, B.m, A.m}
Graph.Edges = {(c1,B.m), (c2,B.m), (c2,A.m)}
worklist = {A.m}

At call site c3: x.n()

x is of type A => A, B, or C possible
A and B are instantiated, there is no B.n; so, edge (c3,A.n) is added to the graph

A.m and A.n have empty bodies, and the graph is completed

Credits: Slide content contains repurposed material originally by Atanas Rountev.

43 of 47

RTA vs. CHA

44 of 47

RTA vs. CHA

The key advantage: RTA was able to determine that C is never instantiated in reachable methods

This means that C.m and C.n can never be targets

Of course, this is just one possible source of imprecision

Analyses that are “more aggressive” than RTA focus on some of these sources

Credits: Slide content contains repurposed material originally by Atanas Rountev.

45 of 47

Some Existing Analyses

Credits: Slide content contains repurposed material originally by Atanas Rountev.

46 of 47

More Existing Analyses

Credits: Slide content contains repurposed material originally by Atanas Rountev.

47 of 47

Class Analysis

Class analysis: given a reference variable x, what are the classes of the objects that x may refer to?

a.k.a. “type analysis” (e.g., RTA)
After a class analysis, it is trivial to construct the call graph

As a separate post-processing phase

Most class analyses construct the call graph on the fly during the analysis

For object-oriented languages, “call graph construction”, “class analysis”, and “type analysis” are often used as synonyms

Points-to analysis can be thought of as a particular form of class/type analysis

Next: “classic” points-to analysis, closely related to 0-CFA type analysis (see two slides earlier)

Credits: Slide content contains repurposed material originally by Atanas Rountev.

1 of 47

2 of 47

3 of 47

4 of 47

5 of 47

6 of 47

7 of 47

8 of 47

9 of 47

10 of 47

11 of 47

12 of 47

13 of 47

14 of 47

15 of 47

16 of 47

17 of 47

18 of 47

19 of 47

20 of 47

21 of 47

22 of 47

23 of 47

24 of 47

25 of 47

26 of 47

27 of 47

28 of 47

29 of 47

30 of 47

31 of 47

32 of 47

33 of 47

34 of 47

35 of 47

36 of 47

37 of 47

38 of 47

39 of 47

40 of 47

41 of 47

42 of 47

43 of 47

44 of 47

45 of 47

46 of 47

47 of 47