1 of 47

Programming Languages/Software Engineering

Construction of Call Graphs

Credits: Slide content contains repurposed material originally by Atanas Rountev.

2 of 47

David Grove and Craig Chambers. 2001. A framework for call graph construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 (November 2001), 685-746.

2

3 of 47

Call Graphs

  • Widely-used representation of calling relationships
    • Key component of interprocedural control-flow analysis
    • First step toward interprocedural dataflow analysis

3

Why?

Credits: Slide content contains repurposed material originally by Atanas Rountev.

4 of 47

Map of what is coming next

  • Call graph construction for C
  • Call graph construction for object-oriented languages (focus on Java)
    • Class Hierarchy Analysis
    • Rapid Type Analysis

4

Credits: Slide content contains repurposed material originally by Atanas Rountev.

5 of 47

Call Graph Construction for C

Problem: function pointers�Examples from “Precise Call Graphs for C Programs with Function Pointers”, Ana Milanova, Atanas Rountev, and Barbara G. Ryder, International Journal of Automated Software Engineering (JASE), 2004

5

Credits: Slide content contains repurposed material originally by Atanas Rountev.

6 of 47

6

7 of 47

Another Example

  • What do we do with these function pointers?
  • Simple answer: any function whose address is taken could possibly be called
    • Can try to restrict only to functions that “match” the types at the call site; be careful, e.g., void *xmalloc(size_t)

7

Credits: Slide content contains repurposed material originally by Atanas Rountev.

8 of 47

Precise Resolution of Function Pointers

  • Need interprocedural points-to analysis
    • Need a call graph! (flow of pointer values through parameter passing and procedure return values)
  • Simple solution
    • Conservative call graph based on address-taken
    • Do points-to analysis
    • Re-compute the call graph using points-to information
  • Or, call graph construction during points-to analysis
    • Start without any knowledge of f.p. Calls
    • When a f.p. “shows up” in Pt(fp) at a call (*fp)(…), resolve it and update the points-to solution
    • Theoretically more precise; hard to design/implement

8

Credits: Slide content contains repurposed material originally by Atanas Rountev.

9 of 47

Call Graph Construction for C (cont’d)

  • Problems come not only from function pointers ...
  • Library calls: typically, the pre-compiled libraries are not analyzed
    • Standard libraries
    • Third-party libraries
  • A library call can trigger a callback to the program
    • E.g. in stdlib.h: void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))
  • setjmp and longjmp
    • setjmp(jmp_buf env): stores the registers in env, including the stack pointer and the program counter
    • longjmp(env): restores the registers; execution continues after the setjump program point

9

Credits: Slide content contains repurposed material originally by Atanas Rountev.

10 of 47

Methods Calls (Invocations in Java)

  • x.m(a,b): method invocation at compile time
    • A target method is associated with the call
    • compile-time target”, “static target
    • Based on the declared type of variable x

��x has declared type A: compile-time target is A.m�javac encodes this in the bytecode (foo.class)�virtualinvoke x,<A: void m(int,int)>

10

Credits: Slide content contains repurposed material originally by Atanas Rountev.

11 of 47

Methods Calls (Invocations in Java)

  • virtualinvoke x,<A: void m(int,int)> inside the JVM
    • Look at the class Z of the object that x refers to at that particular moment
    • Search Z for a method with signature m(int,int) and return type void
  • If Z doesn’t have it, go to Z’s superclass, and so on upwards, until a match is found
  • Invoke the method on the object that is pointed-to by x

Run-time (dynamic) target: “lowest” method that matches the signature and the return type of the static target (“lowest” w.r.t. inheritance chain from Z to java.lang.Object)

This process is called virtual dispatch or method lookup

11

Credits: Slide content contains repurposed material originally by Atanas Rountev.

12 of 47

Call Graphs for Software Understanding

  • Tools for software understanding
    • “smart” development environments (e.g., Eclipse, NetBeans, IntelliJ, PyCharm, Cloud9, Xcode), maintenance tools, visualization tools, etc.

12

Credits: Slide content contains repurposed material originally by Atanas Rountev.

13 of 47

Call Graphs for Optimizations

  • Resolution of virtual calls
    • e.g. “virtualinvoke” in Java bytecode����
  • If the call has only one outgoing edge in the call graph, the virtual dispatch at run time will always produce the same target
    • So, before the program is even executed, we can replace the virtual call with a “normal” call
    • Or, alternatively, after the program is loaded in the JVM, do run-time analysis and optimizations
    • This can be part of the JIT

13

Credits: Slide content contains repurposed material originally by Atanas Rountev.

14 of 47

Resolution of Virtual Calls

  • Probably the oldest optimization problem for object-oriented languages
    • Smalltalk, C++, Java, many research languages
    • Goal 1: remove run-time virtual dispatch
    • Goal 2: inlining – insert the body of the called method in the caller (big performance win)�������

14

Do this at compile time or at run time

Credits: Slide content contains repurposed material originally by Atanas Rountev.

15 of 47

The World of Call Graph Construction [Grove & Chambers 2001]

15

16 of 47

Class Hierarchy Analysis (CHA)

  • The simplest method for call graph construction
    • At the bottom of the previous slide
  • Start from main and perform reachability
    • The only tricky part: virtual calls
  • Helper function used in CHA: dispatch
    • Simulates the effects of the run-time virtual dispatch (a.k.a. method lookup)
  • Note: even CHA gets tricky in the presence of dynamic class loading, reflection, native methods, etc.
    • “Assumption Hierarchy for a CHA Call Graph Construction Algorithm”, Jason Sawin and Atanas Rountev, IEEE Int. Working Conference on Source Code Analysis and Manipulation, 2011

16

Credits: Slide content contains repurposed material originally by Atanas Rountev.

17 of 47

Class Hierarchy Analysis (CHA)

  • Many other usages.
    • Type inference:
      • Raffi Khatchadourian. Automated refactoring of legacy Java software to enumerated types. Automated Software Engineering, 24(4):757–787, December 2017.
    • Code “fact” generation, i.e., relationships between program entities including method calling:
      • M. R. Robillard and G. C. Murphy, "Concern graphs: finding and describing concerns using structural program dependencies," Proceedings of the 24th International Conference on Software Engineering. ICSE 2002, Orlando, FL, USA, 2002, pp. 406-416.
      • Raffi Khatchadourian, Awais Rashid, Hidehiko Masuhara, and Takuya Watanabe. Detecting broken pointcuts using structural commonality and degree of interest. Science of Computer Programming, 150:56–74, December 2017.

17

Credits: Slide content contains repurposed material originally by Atanas Rountev.

18 of 47

dispatch

dispatch(call_site s, receiver_class rc)

sig = signature_of_static_target(s)� ret = return_type_of_static_target(s)� c = rc;� while (c != null)� if class c contains a method m with � signature sig and return type ret� return m� c = superclass(c)� print “ERROR: this should be unreachable”

18

Why?

Credits: Slide content contains repurposed material originally by Atanas Rountev.

19 of 47

One Possible Implementation of CHA

  1. Queue worklist
  2. CallGraph Graph
  3. worklist.addAtTail(main);
  4. Graph.addNode(main)
  5. while (worklist.notEmpty())
  6. m = worklist.getFromHead();
  7. process_method_body(m);

19

Credits: Slide content contains repurposed material originally by Atanas Rountev.

20 of 47

process_method_body(method m)

for each call site s inside m� if s is a static call or a constructor call or a call through super� add_edge(s)� if s is a virtual call v.n(…)� rcv_class = type_of(v);� for each non-abstract class c that is a subclass of � rcv_class or rcv_class itself� x = dispatch(s,c)� add_edge(s,x)

20

Credits: Slide content contains repurposed material originally by Atanas Rountev.

21 of 47

add_edge

add_edge(call_site s)� // for static calls, constructor calls, and calls through super� m = target(s);� if m is not in Graph� Graph.addNode(m);� worklist.addAtTail(m);� Graph.addEdge(s,m)

add_edge(call_site s, run_time_target x)� // same here

21

Credits: Slide content contains repurposed material originally by Atanas Rountev.

22 of 47

Example

22

23 of 47

Example

23

24 of 47

Example

  • State after processing c1
    • worklist = {B.m,C.m}
    • Graph.Nodes = {A.main, B.m, C.m}
    • Graph.Edges = { (c1,B.m), (c1,C.m) }
  • Edge (c1,C.m) is spurious (infeasible)
    • There is no execution of the program in which c1 invokes C.m
  • More precise analyses produce fewer spurious edges
    • Typically are more expensive (time/memory)

24

Credits: Slide content contains repurposed material originally by Atanas Rountev.

25 of 47

Example

25

26 of 47

Example

  • State after processing c2
    • worklist = {B.m,C.m,A.m}
    • Graph.Nodes = {A.main, B.m, C.m, A.m}
    • Graph.Edges = {(c1,B.m),(c1,C.m), (c2,A.m),(c2,B.m),(c2,C.m) }
  • Edges (c2,A.m) and (c2,C.m) are spurious
  • After we are done with A.main, take the next method at the head of the queue
    • in this case B.m

26

Credits: Slide content contains repurposed material originally by Atanas Rountev.

27 of 47

Example

27

28 of 47

Example

  • State after processing c3
    • worklist = {C.m,A.m,A.n,C.n}
    • Graph.Nodes = {A.main, B.m, C.m, A.m, C.n}
    • Graph.Edges = {(c1,B.m), (c1,C.m), (c2,A.m), (c2,B.m), (c2,C.m), (c3,A.n), (c3,C.n) }
  • Edge (c3,C.n) is spurious
  • The rest of the methods in the queue have empty bodies, so the rest of the algorithm doesn’t create any new edges/nodes

28

Credits: Slide content contains repurposed material originally by Atanas Rountev.

29 of 47

The Resulting Graph

29

Credits: Slide content contains repurposed material originally by Atanas Rountev.

30 of 47

Rapid Type Analysis

  • An analysis that is the next step after CHA
    • Guaranteed to produce a call graph that is a subset of the call graph produced by CHA
    • Still quite imprecise. There are many analyses that are better than RTA
  • “type analysis”
    • Idea: given a reference/pointer variable, try to figure out what types of objects this variable may refer/point to

30

Credits: Slide content contains repurposed material originally by Atanas Rountev.

31 of 47

Rapid Type Analysis

  • Basic insight: some classes are never instantiated in reachable methods
    • i.e. there is never a new X() expression
  • Main reason: programs that are built on top of libraries
    • Large parts of the library code are unused
  • When we try to figure out the possible run-time targets of a virtual call, we can safely ignore classes that are not instantiated

31

Credits: Slide content contains repurposed material originally by Atanas Rountev.

32 of 47

One Possible Implementation of RTA

  1. Queue worklist
  2. CallGraph Graph
  3. worklist.addAtTail(main);
  4. Set instantiated_classes
  5. Map pending_call_sites
  6. Graph.addNode(main)
  7. while (worklist.notEmpty())
  8. m = worklist.getFromHead();
  9. process_method_body(m);

32

Credits: Slide content contains repurposed material originally by Atanas Rountev.

33 of 47

process_method_body(method m)

for each expression new X inside m� if (X ∉ instantiated_classes)� add X to instantiated_classes� resolve_pending(X)�for each call site s inside m� if s is a static call or a constructor call or a call through super� add_edge(s)� if s is a virtual call v.n(…)� rcv_class = type_of(v);� for each non-abstract class c that is a � subclass of rcv_class or rcv_class itself� process_rcv_class(c,s)

33

Credits: Slide content contains repurposed material originally by Atanas Rountev.

34 of 47

process_rcv_class

process_rcv_class(class c, call_site s)� x = dispatch(s,c)� if c ∈ instantiated_classes� add_edge(s,x)� else // c is not currently instantiated, � // but in the future it may be, so� // we have to remember this edge� remember (s,x) in pending(c)

34

Credits: Slide content contains repurposed material originally by Atanas Rountev.

35 of 47

resolve_pending(class c)

// class c became instantiated, and�// we need to add all pending edges�for each (s,x) in pending(c)� add_edge(s,x)

Called by process_method_body :�for each expression new X� if (X ∉ instantiated_classes)� add X to instantiated_classes� resolve_pending(X)

35

Credits: Slide content contains repurposed material originally by Atanas Rountev.

36 of 47

Example

36

37 of 47

37

38 of 47

Example

  • process_rcv_class(c1,B)
    • Since B is instantiated, add edge (c1,B.m)
  • process_rcv_class(c1,C)
    • Since C is not instantiated, we do not add edge (c1,C.m) to the call graph
  • Remember (c1,C.m) in pending(C)
  • State after processing c1
    • worklist = {B.m}
    • Graph.Nodes = {A.main, B.m}
    • Graph.Edges = { (c1,B.m)}

38

Credits: Slide content contains repurposed material originally by Atanas Rountev.

39 of 47

39

40 of 47

Example

  • (c2,A): add (c2,A.m) to pending(A)
  • (c2,B): add (c2,B.m) to Graph
  • (c2,C): add (c2,C.m) to pending(C)
  • State after processing c2
    • worklist = {B.m}
    • Graph.Nodes = {A.main, B.m}
    • Graph.Edges = {(c1,B.m), (c2,B.m)}
    • pending(A) = {(c2,A.m)}
    • pending(C) = {(c1,C.m),(c2,C.m)}

40

Credits: Slide content contains repurposed material originally by Atanas Rountev.

41 of 47

41

42 of 47

Example

  • resolve_pending(A)
    • Graph.Nodes = {A.main, B.m, A.m}
    • Graph.Edges = {(c1,B.m), (c2,B.m), (c2,A.m)}
    • worklist = {A.m}
  • At call site c3: x.n()
    • x is of type A => A, B, or C possible
    • A and B are instantiated, there is no B.n; so, edge (c3,A.n) is added to the graph
  • A.m and A.n have empty bodies, and the graph is completed

42

Credits: Slide content contains repurposed material originally by Atanas Rountev.

43 of 47

RTA vs. CHA

43

44 of 47

RTA vs. CHA

  • The key advantage: RTA was able to determine that C is never instantiated in reachable methods
    • This means that C.m and C.n can never be targets
  • Of course, this is just one possible source of imprecision
    • Analyses that are “more aggressive” than RTA focus on some of these sources

44

Credits: Slide content contains repurposed material originally by Atanas Rountev.

45 of 47

Some Existing Analyses

45

Credits: Slide content contains repurposed material originally by Atanas Rountev.

46 of 47

More Existing Analyses

46

Credits: Slide content contains repurposed material originally by Atanas Rountev.

47 of 47

Class Analysis

  • Class analysis: given a reference variable x, what are the classes of the objects that x may refer to?
    • a.k.a. “type analysis” (e.g., RTA)
    • After a class analysis, it is trivial to construct the call graph
      • As a separate post-processing phase
  • Most class analyses construct the call graph on the fly during the analysis
    • For object-oriented languages, “call graph construction”, “class analysis”, and “type analysis” are often used as synonyms
  • Points-to analysis can be thought of as a particular form of class/type analysis
    • Next: “classic” points-to analysis, closely related to 0-CFA type analysis (see two slides earlier)

47

Credits: Slide content contains repurposed material originally by Atanas Rountev.