1 of 193

Abstract (Last updated 2/01/18)

Abstract: In this talk, Michael Shah (“Mike”) will be presenting an introduction to the LLVM Compiler Infrastructure. A discussion of what LLVM is, who is using it, and why you might be interested in using LLVM will be presented during the first part of the talk. The second part of the talk will show interactive examples, taking us through installation to the point where we build and run our first function pass. We will build on top of our first function pass, to begin outputting some program metrics about programs. Mike will also be presenting some steps on how to proceed further and what resources are available for working with LLVM.

Materials:

  • Please bring a laptop with LLVM 5.0 setup if you want to follow along
  • Otherwise materials will be posted to www.mshah.io

Resources:

Contact: mshah.475@gmail.com

Twitter: @MichaelShah

1

www.mshah.io/fosdem18.html

2 of 193

Terminology (Open in a second browser if you like)

  • LLVM - The name of the project (not an acronym)
  • IR - Intermediate representation (Human-readable, 3 address, assembly like representation)
  • Bitcode (.bc) - LLVM binary format of the IR
  • JIT - Just-In-Time Compiler
  • SSA - Single Static Analysis

2

www.mshah.io/fosdem18.html

3 of 193

Introduction to LLVM

(Tutorial)

Mike Shah, Ph.D.

@MichaelShah | mshah.io

February 4, 2018

60-75 Minutes for talk (plenty of time for questions)

3

4 of 193

Demo Time! Right from the start!

  • So you know what to pay attention to!
    • In case you (or maybe I) walked into the wrong room by accident!
    • (Or if you are deciding to commit to an hour long talk online in the distant future)
  • For those attending this talk live
    • Take a moment to introduce yourself to someone next to you .

  • demo1.sh - Print functions from program
  • demo2.sh - Print out stats
  • demo3.sh - Print out direct function callees
  • demo4.sh - Instrument code

4

www.mshah.io/fosdem18.html

5 of 193

Who Am I?�by Mike Shah

  • Currently a lecturer at Northeastern University in Boston, Massachusetts. I teach courses in computer systems, computer graphics, and game engine development.
  • My research is in performance tools using static/dynamic analysis and software visualization.
  • I like teaching, guitar, running, weight training, and anything in computer science under the domain of graphics, visualization, concurrency, and parallelism.
  • www.mshah.io

5

6 of 193

Who Am I?�by Mike Shah

  • Currently a lecturer at Northeastern University in Boston, Massachusetts. I teach courses in computer systems, computer graphics, and game engine development.
  • My research is in performance tools using static/dynamic analysis and software visualization.
  • I like teaching, guitar, running, weight training, and anything in computer science under the domain of graphics, visualization, concurrency, and parallelism.
  • www.mshah.io

6

7 of 193

Who Am I?�by Mike Shah

  • Currently a lecturer at Northeastern University in Boston, Massachusetts. I teach courses in computer systems, computer graphics, and game engine development.
  • My research is in performance tools using static/dynamic analysis and software visualization.
  • I like teaching, guitar, running, weight training, and anything in computer science under the domain of graphics, visualization, concurrency, and parallelism.
  • www.mshah.io

7

8 of 193

Who Am I?�by Mike Shah

  • Currently a lecturer at Northeastern University in Boston, Massachusetts. I teach courses in computer systems, computer graphics, and game engine development.
  • My research is in performance tools using static/dynamic analysis and software visualization.
  • I like teaching, guitar, running, weight training, and anything in computer science under the domain of graphics, visualization, concurrency, and parallelism.
  • www.mshah.io

8

9 of 193

This is an introduction to LLVM

We have some specific goals

  1. Figure out what is LLVM
  2. Understand how to obtain LLVM
    1. (This can be a major bottleneck for students)
  3. Do a little example with Clang
  4. Understand how to produce the demos I have already shown

9

www.mshah.io/fosdem18.html

10 of 193

Goals for Tomorrow

Because you’ll be ready to think about more solutions

  • Know some resources available to continue growing
  • Know some projects to try in the future

10

www.mshah.io/fosdem18.html

11 of 193

Goals for Tomorrow

Because you’ll be ready to think about more solutions

  • Know some resources available to continue growing
  • Know some projects to try in the future
  • Be able to run through these slides again with confidence and excitement!

11

www.mshah.io/fosdem18.html

12 of 193

Slides and code are at the following location

12

www.mshah.io/fosdem18.html

13 of 193

What is LLVM

13

14 of 193

LLVM (Formerly known as Low Level Virtual Machine--but it’s more!)

  • Started at The University of Illinois in 2000.
  • Chris Lattner is the lead architect
  • Backed by companies like Apple, Google, Microsoft, Intel, and more!
  • And of course--open source!

14

www.mshah.io/fosdem18.html

15 of 193

LLVM (Formerly known as Low Level Virtual Machine--but it’s more!)

  • Started at The University of Illinois in 2000.
  • Chris Lattner is the lead architect
  • Backed by companies like Apple, Google, Microsoft, Intel, and more!
  • And of course--open source!

15

What is it that makes LLVM so great that programmers are paying attention to it?

www.mshah.io/fosdem18.html

16 of 193

The Secret Recipe

  • The exact details are listed in the research paper: https://dl.acm.org/citation.cfm?id=977673

16

What is it that makes LLVM so great that programmers are paying attention to it?

www.mshah.io/fosdem18.html

17 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

17

www.mshah.io/fosdem18.html

18 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

18

C++ Source

www.mshah.io/fosdem18.html

19 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

19

Lexers & parsers

www.mshah.io/fosdem18.html

20 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

20

Perform standard optimizations

www.mshah.io/fosdem18.html

21 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

21

Code generator

www.mshah.io/fosdem18.html

22 of 193

Chris Lattner’s big idea

  • Lattner had been thinking about compilers while doing his graduate work.
  • Job of the compiler:
    • Generate a high level language to machine code

22

Machine Code

1010101010101010

www.mshah.io/fosdem18.html

23 of 193

The big idea | Around the year 2000

  • JIT compilers were and continue to gain traction
    • A virtual machine compiles code online
    • This online compilation means performing optimizations over and over again
  • So Lattner et al. big idea was to perform optimizations at compile-time that could do the heavy lifting.
    • Perhaps using some low level virtual machine

23

www.mshah.io/fosdem18.html

24 of 193

The big idea | Around the year 2000

  • JIT compilers were and continue to gain traction
    • A virtual machine compiles code online
    • This online compilation means performing optimizations over and over again
  • So Lattner et al. big idea was to perform optimizations at compile-time that could do the heavy lifting.
    • Perhaps using some Low Level Virtual Machine

24

www.mshah.io/fosdem18.html

25 of 193

The Optimizer

  • So in the middle of our compiler pipeline, the optimizer (or optimization of code) is the focus.

25

Optimizer

www.mshah.io/fosdem18.html

26 of 193

The optimization stage of compilers

  • Typically programs are optimized by manipulating an intermediate representation (IR) of the high level language.
    • The intermediate representation (IR) is more ‘regular’ structurally
      • That means it is easier to analyze and manipulate.
        • (Just think about how many ways you can write and interpret the same program in a high-level language)

26

www.mshah.io/fosdem18.html

27 of 193

The optimization stage of compilers

  • Typically programs are optimized by manipulating an intermediate representation (IR) of the high level language.
    • The intermediate representation (IR) is more ‘regular’ structurally
      • That means it is easier to analyze and manipulate.
        • (Just think about how many ways you can write and interpret the same program in a high-level language)

27

Example of what IR instructions look like

www.mshah.io/fosdem18.html

28 of 193

How to get LLVM

28

29 of 193

How to get LLVM

(And all the tools)

29

30 of 193

How to get LLVM

30

I am actually going to run through this section very quick!

Use it as a reference for how to setup and run examples from this slide deck

31 of 193

How to get LLVM

31

The LLVM project evolves at a good pace.

That is why you will want to know how to build from source to get the latest changes.

32 of 193

Where the instructions always will be

32

www.mshah.io/fosdem18.html

33 of 193

Downloading LLVM 5.0

  • For this talk, I am using and have tested the code with LLVM 5.0
  • This tutorial is for an x86 based Ubuntu 16 machine
    • A similar process should work on Mac
      • (Windows users may need some different tools, I have not built LLVM on windows)
  • Tools you will need
    • svn
    • Cmake
    • Make
    • A C compiler (Mine is GNU 5.4.0)

33

www.mshah.io/fosdem18.html

34 of 193

Create a directory on your desktop

  • I typically append a date to this directory

34

www.mshah.io/fosdem18.html

35 of 193

Subdirectories

  • Within the folder
    • A build directory where our compiled LLVM tools will go
      • (i.e. all the binaries)
    • A source directory where all of the LLVM source files live.

35

www.mshah.io/fosdem18.html

36 of 193

From a Terminal

  1. svn co https://user@llvm.org/svn/llvm-project/llvm/tags/RELEASE_500/final llvm
  2. cd llvm/tools
  3. svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_500/final clang
  4. cd clang/tools # (To be clear, you are now in llvm/tools/clang/tools)
  5. svn co http://llvm.org/svn/llvm-project/clang-tools-extra/tags/RELEASE_500/final extra
  6. cd ../../../../llvm/projects # (To be clear, you are now in llvm/projects)
  7. svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_500/final compiler-rt
  8. cd ../../.. (You are now in your desktop directory)
  9. mkdir build (if you have not already done so)
  10. cd build (You are now in your build directory)
  11. cmake -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_TARGET_ARCH=X86 -DCMAKE_BUILD_TYPE="Release" -DLLVM_BUILD_EXAMPLES=1 -DCLANG_BUILD_EXAMPLES=1 -G "Unix Makefiles" ../source/llvm/
  12. 'make -j 8' (from within the build directory to start the process)

36

www.mshah.io/fosdem18.html

37 of 193

From a Terminal

  • svn co https://user@llvm.org/svn/llvm-project/llvm/tags/RELEASE_500/final llvm
  • cd llvm/tools
  • svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_500/final clang
  • cd clang/tools # (To be clear, you are now in llvm/tools/clang/tools)
  • svn co http://llvm.org/svn/llvm-project/clang-tools-extra/tags/RELEASE_500/final extra
  • cd ../../../../llvm/projects # (To be clear, you are now in llvm/projects)
  • svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_500/final compiler-rt
  • cd ../../.. (You are now in your desktop directory)
  • mkdir build (if you have not already done so)
  • cd build (You are now in your build directory)
  • cmake -DLLVM_TARGETS_TO_BUILD="X86" -DLLVM_TARGET_ARCH=X86 -DCMAKE_BUILD_TYPE="Release" -DLLVM_BUILD_EXAMPLES=1 -DCLANG_BUILD_EXAMPLES=1 -G "Unix Makefiles" ../source/llvm/
  • 'make -j 8' (from within the build directory to start the process)

37

Now get lunch/dinner/breakfast depending on speed of your cpu.

www.mshah.io/fosdem18.html

38 of 193

How will we know it worked?

  • Check your build/bin directory
  • It should look something like this
  • Note that for the examples, clang++, and other tools are referenced from here!
    • If your system already has clang++ installed from a package manager, it may have a different version!

38

www.mshah.io/fosdem18.html

39 of 193

How to get LLVM

39

(Expect ~15-45 or more minutes to build from source depending on your cpu and internet connection)

Assumption: We all have a working LLVM at this point

40 of 193

Our first example | Emitting LLVMs intermediate form

  • We can output and actually look at LLVM’s intermediate form.
  • We are going to use the ‘clang++’ compiler
    • clang and clang++ are frontends for the C/C++ language.
    • The code they generate targets the LLVM intermediate form.
      • Let us try!

40

www.mshah.io/fosdem18.html

41 of 193

Our first example | Emitting LLVMs intermediate form

  • Here is some code we can use
    • hello.cpp

41

www.mshah.io/fosdem18.html

42 of 193

Compile and run

42

www.mshah.io/fosdem18.html

43 of 193

Compile and run

Again, make sure you are using the correct version of clang++ that we built!

43

www.mshah.io/fosdem18.html

44 of 193

Now we can use clang++ to emit LLVM IR

44

�Our goal: Get an intermediate representation

Then we can talk more about this step:

www.mshah.io/fosdem18.html

45 of 193

Now we can use clang++ to emit LLVM IR

45

www.mshah.io/fosdem18.html

46 of 193

Now we can use clang++ to emit LLVM IR

  • Compiler arguments explained
    • -S -- only run preprocessor and compilation steps
    • -emit-llvm -- Use the LLVM Representation for assembler and object files

(Use clang++ -help to see options)

46

www.mshah.io/fosdem18.html

47 of 193

Aside: Clang++, isn’t this an LLVM talk?

  • The news my friends is that LLVM has expanded since the early 2000s!
  • LLVM is an umbrella of tools

47

www.mshah.io/fosdem18.html

48 of 193

LLVM Tools

48

49 of 193

LLVM Tools - clang/clang++

  1. clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  2. llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  3. llvm-dis - Converts bitcode to text readable llvm assembly
  4. llvm-link - Links two or more llvm bitcode files into one file.
  5. lli - Directly executes programs bit-code using JIT
  6. llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  7. opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  8. More

49

www.mshah.io/fosdem18.html

50 of 193

What a second Mike!

50

So clang or perhaps other tools can work with this “LLVM”

Yes

No

51 of 193

What a second Mike!

51

So clang or perhaps other tools can work with this “LLVM”

Yes

No

52 of 193

Modularity

  • A key feature is that language frontends can all target the same IR
  • The optimizer can optimize that IR
  • And the code generator can just the same target many other targets

52

sources: AOSA Book

www.mshah.io/fosdem18.html

53 of 193

Modularity

  • A key feature is that language frontends can all target the same IR
  • The optimizer can optimize that IR
  • And the code generator can just the same target many other targets

53

sources: AOSA Book

Okay, now let us take a closer look at that IR

www.mshah.io/fosdem18.html

54 of 193

[Pop Quiz] What does this function do?

54

55 of 193

[Pop Quiz] What does this function do?

55

Guesses from the audience?

56 of 193

[Pop Quiz] What does this function do?

56

Well it is named “add1”

57 of 193

[Pop Quiz] What does this function do?

57

There are 2 i32 arguments

58 of 193

[Pop Quiz] What does this function do?

58

i32 = int

59 of 193

[Pop Quiz] What does this function do?

59

Every function has a starting point

60 of 193

[Pop Quiz] What does this function do?

60

We store a result of an ‘add’ operation

61 of 193

[Pop Quiz] What does this function do?

61

Then return the result as an int

62 of 193

[Pop Quiz] What does this function do?

62

If you can read assembly (or even C!) you can understand LLVM �Intermediate Representation

63 of 193

LLVM’s Secret Sauce

63

64 of 193

LLVM IR

  • The LLVM IR can be targeted by many languages (we have discussed that)
    • It is fairly readable
    • It is also fairly writeable, considered a first-class language!
      • It is well-defined! (You have an alternative to targeting ‘C’ as your IR language :) )
  • Other takeaways
    • The IR is strongly typed (e.g. i32 or even with pointers such as i32*)
    • There are an infinite number of registers
      • You did not see a finite amount of registers like %rax, %rdx, %r15 if you are use to x86
      • Rather, anything that starts with ‘%’ is a temporary register
      • IR uses Single Static Assignment (SSA) form.
        • Aides in program analysis and compiler optimizations
          • Constant Propagation
          • Dead Code Elimination
          • etc.

64

sources: AOSA Book

www.mshah.io/fosdem18.html

65 of 193

(Quick Aside: SSA example from wikipedia)

65

Not SSA

Uses SSA

www.mshah.io/fosdem18.html

66 of 193

(Quick Aside: SSA example from wikipedia)

66

Not SSA

Uses SSA

Quickly notice we can eliminate an extra variable

www.mshah.io/fosdem18.html

67 of 193

(Again, more examples from AOSA book from Lattner himself)

67

www.mshah.io/fosdem18.html

68 of 193

Using Clang++ and Generating IR

68

69 of 193

Example 1 | hello.cpp

  • Returning to our example of ‘hello world’
  • This command generated a .ll file (two lower-case L’s).
    • .ll files are the ‘textual’ form of LLVM’s IR.

(Note ubuntu users: if the above failed, try adding -fno-use-cxa-atexit link)

69

www.mshah.io/fosdem18.html

70 of 193

And here it is:

70

www.mshah.io/fosdem18.html

71 of 193

Pause -- Really take a second to look at the IR

What jumps out at you in this snippet?

71

Audience, what stands out?

www.mshah.io/fosdem18.html

72 of 193

My Findings

72

  • Source filename
  • Data layout
  • Target Triple
  • Functions, Structure Types
  • Lots of % signs - These are registers (Remember the thing about SSA?)

  • Other important things (not in this IR--phi nodes)
  • Attributes
  • type information! Cool--better than assembly!
  • Meta data (At the end with the “!”)

www.mshah.io/fosdem18.html

73 of 193

Targeting different backends

  • Source filename
  • Data layout
  • Target Triple
  • Functions, Structure Types
  • Lots of % signs - These are registers
  • Other important things (not in this IR--phi nodes)
  • Attributes
  • type information! Cool--better than assembly!
  • Meta data (At the end with the “!”)

73

Looks like good information to have for this stage (which we will not get to today)

www.mshah.io/fosdem18.html

74 of 193

Targeting different backends

  • Source filename
  • Data layout
  • Target Triple
  • Functions, Structure Types
  • Lots of % signs - These are registers
  • Other important things (not in this IR--phi nodes)
  • Attributes
  • type information! Cool--better than assembly!
  • Meta data (At the end with the “!”)

74

Are you enjoying the readability of IR yet?

Good news, machines like IR too

www.mshah.io/fosdem18.html

75 of 193

LLVM Tools - lli

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

75

www.mshah.io/fosdem18.html

76 of 193

The IR is very assembly like -- very readable!

  • In fact the machine can read it, and the machine can directly execute the IR using it's Just-in-time (JIT compile for current architecture) execution engine.
  • Let’s do it now using lli (“L L I”)
  • What do you see?
    • Program should execute -- even though you did not see executable!
    • LLI can directly execute IR!

  • (If you’re on Ubuntu 16.04--you may need an additional flag)
    • ./../llvm_build/bin/clang++ -S -emit-llvm hello.cpp -fno-use-cxa-atexit

76

www.mshah.io/fosdem18.html

77 of 193

The IR is very assembly like -- very readable!

  • In fact the machine can read it, and the machine can directly execute the IR using it's Just-in-time (JIT compile for current architecture) execution engine.
  • Let’s do it now using lli (“L L I”)
  • What do you see?
    • Program should execute -- even though you did not see executable!
    • LLI can directly execute IR!

  • (If you’re on Ubuntu 16.04--you may need an additional flag)
    • ./../llvm_build/bin/clang++ -S -emit-llvm hello.cpp -fno-use-cxa-atexit

77

IR has a binary form called bitcode (.bc).

Binary data will be more compact and thus to run through a JIT!

www.mshah.io/fosdem18.html

78 of 193

LLVM Tools - llvm-as

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

78

www.mshah.io/fosdem18.html

79 of 193

Let’s convert .ll to a .bc file | llvm-as

The llvm assembler converts the textual (or readable) IR to bitcode and now we have “hello.bc”.

79

www.mshah.io/fosdem18.html

80 of 193

Same result, as expected!

80

www.mshah.io/fosdem18.html

81 of 193

lli executes bitcode (binary format of IR)

My claim is the JIT engine can execute more efficiently (Why?).

81

www.mshah.io/fosdem18.html

82 of 193

lli executes bitcode (binary format of IR)

My claim is the JIT engine can execute more efficiently (Why?).

^binary representation of the textual .ll format we previously saw. A little more compressed, smaller file size.

82

www.mshah.io/fosdem18.html

83 of 193

lli executes bitcode (binary format of IR)

My claim is the JIT engine can execute more efficiently (Why?).

^binary representation of the textual .ll format we previously saw. A little more compressed, smaller file size.

83

Eventually we may want the assembly for our target machine to build an executable

www.mshah.io/fosdem18.html

84 of 193

LLVM Tools - llc

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

84

www.mshah.io/fosdem18.html

85 of 193

The full circle -- compile our IR to assembly (.s file)

Run llc on our .bc file which creates an assembly file (hello.s)

85

www.mshah.io/fosdem18.html

86 of 193

The full circle -- compile our IR to assembly (.s file)

Run llc on our .bc file which creates an assembly file (hello.s)

86

hello.s

www.mshah.io/fosdem18.html

87 of 193

The full circle -- compile our IR to assembly (.s file)

A wide variety of targets are available for you to generate assembly code.

87

www.mshah.io/fosdem18.html

88 of 193

The full circle -- compile our IR to assembly (.s file)

A wide variety of targets are available for you to generate assembly code.

88

At this point in the talk, we have played with IR and gotten familiar with some tools.

We have not utilized the optimizer, (i.e. Lattner’s big idea)

www.mshah.io/fosdem18.html

89 of 193

LLVM Tools - opt

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

89

www.mshah.io/fosdem18.html

90 of 193

Lets run opt | ./../opt hello.ll --time-passes

90

www.mshah.io/fosdem18.html

91 of 193

Passes with ‘opt’

  • Opt is the ‘optimizer’
  • It works by making several passes through a module of code looking for opportunities to ‘optimize’ the code.
  • There exists several ways to ‘pass’ through the code and gather information or make code changes.

91

www.mshah.io/fosdem18.html

92 of 193

Passes with ‘opt’

  • Opt is the ‘optimizer’
  • It works by making several passes through a module of code looking for opportunities to ‘optimize’ the code.
  • There exists several ways to ‘pass’ through the code and gather information or make code changes.

92

www.mshah.io/fosdem18.html

93 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

93

www.mshah.io/fosdem18.html

94 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

94

www.mshah.io/fosdem18.html

95 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

95

www.mshah.io/fosdem18.html

96 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

96

www.mshah.io/fosdem18.html

97 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

97

www.mshah.io/fosdem18.html

98 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

98

www.mshah.io/fosdem18.html

99 of 193

Different Types of Passes in LLVM

  • Levels of Granularity
  • Analysis Passes versus Transform pass
    • Analysis Pass - Computes information that other passes can use for debugging
    • Transform Pass - Mutates the program.
      • i.e. A side effect occurs, which could invalidate other passes!

99

Our next task:

Learn how to analyze IR with passes. This can lead toward paths of:

  1. Code optimization
  2. Code understanding
  3. etc.

www.mshah.io/fosdem18.html

100 of 193

Goal - Print all of the Functions in a program

100

www.mshah.io/fosdem18.html

101 of 193

Goal - Print all of the Functions in a program

101

Guesses from the audience?

www.mshah.io/fosdem18.html

102 of 193

Goal - Print all of the Functions in a program

102

www.mshah.io/fosdem18.html

103 of 193

Goal - Print all of the Functions in a program

103

Maybe I would accept other answers as well, but “Function Pass” is the easiest route

www.mshah.io/fosdem18.html

104 of 193

Writing Our First Function Pass

104

105 of 193

We will be working in: llvm/lib/Transforms/Hello/Hello.cpp

105

  • This is given to you when you download LLVM
    • (You can learn how to add more passes here)

www.mshah.io/fosdem18.html

106 of 193

(A visual if anyone setup Codeblocks)

This is given to you when you download LLVM (You can learn how to add more passes here)

106

www.mshah.io/fosdem18.html

107 of 193

107

Okay, here is hello.cpp

It is a FunctionPass

www.mshah.io/fosdem18.html

108 of 193

108

(This code is included with LLVM)

www.mshah.io/fosdem18.html

109 of 193

www.mshah.io/fosdem18.html

110 of 193

The piece we care about for now

www.mshah.io/fosdem18.html

111 of 193

Building our hello pass

  • Navigate to the build directory
  • In the lib/Transforms/Hello folder you’ll find a make file
  • type ‘make’
  • Any changes we have made will build.

111

www.mshah.io/fosdem18.html

112 of 193

Our pass is then compiled in build/lib/ as LLVMHello.so

112

www.mshah.io/fosdem18.html

113 of 193

Run our first pass with opt on hello.bc

113

opt tool which we have used before

www.mshah.io/fosdem18.html

114 of 193

Run our first pass with opt on hello.bc

114

We load the library which contains our passes

www.mshah.io/fosdem18.html

115 of 193

Run our first pass with opt on hello.bc

115

Path to our LLVMHello pass library

www.mshah.io/fosdem18.html

116 of 193

Run our first pass with opt on hello.bc

116

The particular function pass we want to run

www.mshah.io/fosdem18.html

117 of 193

Run our first pass with opt on hello.bc

117

Our input file (.bc or .ll file)

www.mshah.io/fosdem18.html

118 of 193

Run our first pass with opt on hello.bc

  • Neat--we see all of the functions!
    • Or rather, we have one ‘main’ function in our program.

118

www.mshah.io/fosdem18.html

119 of 193

Anatomy of a “Pass”

119

120 of 193

piece of code that does the work

www.mshah.io/fosdem18.html

121 of 193

We are not ‘mutating code’ so return false.

www.mshah.io/fosdem18.html

122 of 193

Inherit from the ‘FunctionPass’ class

www.mshah.io/fosdem18.html

123 of 193

Register the pass. This is how the pass is built

www.mshah.io/fosdem18.html

124 of 193

i.e. how I knew what to type in the comand line in our example

www.mshah.io/fosdem18.html

125 of 193

125

Congratulations on writing/running your first pass

LLVM is properly configured, on to more analysis

www.mshah.io/fosdem18.html

126 of 193

Static Analysis

Goal of Static Analysis: What information/bugs/performance errors can we uncover before we run the program.

Pros: Gives us full coverage of program �Cons: No real runtime data, overly conservative

126

127 of 193

Our Second pass -- This time we collect some program stats

  1. It will print the function name
  2. It will count basic blocks and instruction counts.

127

www.mshah.io/fosdem18.html

128 of 193

Our Second pass -- This time we collect some program stats

  • It will print the function name
  • It will count basic blocks and instruction counts.
  • We’ll use this new sample source code -- or even better use one of your own!

128

www.mshah.io/fosdem18.html

129 of 193

Compile and Test loops.cpp and use loops.ll on -hello pass

  1. Compile program to IR
    1. ./../clang++ -S -emit-llvm loops.cpp
    2. Test opt with our old pass (note we can just use the .ll version for this sample)
      1. ./../opt -load ./../../lib/LLVMHello.so -hello < loops.ll > /dev/null

129

www.mshah.io/fosdem18.html

130 of 193

The Stats Pass source code

130

Okay, here is our second pass

It is a FunctionPass that collects stats

www.mshah.io/fosdem18.html

131 of 193

The Stats Pass source code

131

Here is where we will accumulate the basic blocks and instructions within our function

www.mshah.io/fosdem18.html

132 of 193

The Stats Pass source code

132

Here notice, that within a function, we can iterate through its basic blocks, and every instruction within each basic block

www.mshah.io/fosdem18.html

133 of 193

The Stats Pass source code

133

And finally we output this information

www.mshah.io/fosdem18.html

134 of 193

(Don’t forget to save, and rebuild our pass)

134

www.mshah.io/fosdem18.html

135 of 193

Results of pass 2 (with loops.ll)

  • ./../opt -load ./../../lib/LLVMHello.so -hello2 < loops.ll > /dev/null

135

www.mshah.io/fosdem18.html

136 of 193

Results of pass 2 (with loops.ll)

  • ./../opt -load ./../../lib/LLVMHello.so -hello2 < loops.ll > /dev/null

136

Same library, but different pass that’s it!

www.mshah.io/fosdem18.html

137 of 193

Results of pass 2 (with loops.ll)

  • ./../opt -load ./../../lib/LLVMHello.so -hello2 < loops.ll > /dev/null

137

Observe here, same pass runs on every function. There is no “memory” here of previous runs. Need a data structure, analysis pass, or perhaps “module pass”

www.mshah.io/fosdem18.html

138 of 193

Results of pass 2 (with loops.ll)

  • ./../opt -load ./../../lib/LLVMHello.so -hello2 < loops.ll > /dev/null

  • Let’s add more!
  • What can we do with instruction information?

138

www.mshah.io/fosdem18.html

139 of 193

139

Here’s homework for later!

I’m not pulling these ideas from nowhere!

www.mshah.io/fosdem18.html

140 of 193

140

Okay, here is our third pass

It is a FunctionPass that shows direct function calls

www.mshah.io/fosdem18.html

141 of 193

141

www.mshah.io/fosdem18.html

142 of 193

Find Direct Calls

Added new header: #include "llvm/IR/CallSite.h"

142

www.mshah.io/fosdem18.html

143 of 193

Find Direct Calls

Added new header: #include "llvm/IR/CallSite.h"

143

A callsite ??

www.mshah.io/fosdem18.html

144 of 193

LLVM Docs

  • I do not actually know all of the LLVM commands by heart.
  • As you start with LLVM, it is a good idea to keep the doxygen documentation open.
  • “googling LLVM ______” will lead you to the correct page most often

144

www.mshah.io/fosdem18.html

145 of 193

LLVM Docs

  • From the documentation you can navigate to the appropriate function and even the source code

145

www.mshah.io/fosdem18.html

146 of 193

(Pssst! You have the source code as well)

Here is a sample grep

  • Often times grepping through the source code gives you ideas of how to use instructions
  • I myself do not pretend to be compared with the LLVM experts!

146

www.mshah.io/fosdem18.html

147 of 193

(continued) Find Direct Calls

Added new header: #include "llvm/IR/CallSite.h"

147

If our instruction is not a ‘callable’ (i.e. a function)

www.mshah.io/fosdem18.html

148 of 193

(continued) Find Direct Calls

Added new header: #include "llvm/IR/CallSite.h"

148

Find out if our ‘callee’ is a direct function call (not a function pointer or anything)

www.mshah.io/fosdem18.html

149 of 193

The Result!

  • Simple little function pass
  • Now you can use this information to build a data structure
    • The function “F” is the caller, and “f” the callee.
    • Each of these forms an edge and could be put into a graph data structure.
    • Then output static graphs!

149

www.mshah.io/fosdem18.html

150 of 193

Bonus Trick: Outputting graphs

150

151 of 193

LLVM actually provides a pass that can output control flow graphs

  • Install a dot file viewer
    • sudo apt install xdot (for linux)
  • Generate a dot file with
    • ./../opt -dot-cfg-only loops.ll > /dev/null
  • View dot file with
    • xdot cfg._Z9countDownv.dot

151

www.mshah.io/fosdem18.html

152 of 193

Here is the ‘countdown function’ from loops.pp

152

www.mshah.io/fosdem18.html

153 of 193

Here is the ‘countdown function’ from loops.pp

  • You can slowly map each basic block from the visualization to the C++ code in this way.

153

www.mshah.io/fosdem18.html

154 of 193

Here is the ‘countdown function’ from loops.pp

  • You can slowly map each basic block from the visualization or directly to the IR

154

www.mshah.io/fosdem18.html

155 of 193

Dynamic Analysis

Goal of Dynamic Analysis: What information/bugs/performance errors can we uncover when we run the program.

Pros: Gives us real values

Cons: Instrumentation effects results & Performance

155

156 of 193

Dynamic Analysis

Goal of Dynamic Analysis: What information/bugs/performance errors can we uncover when we run the program.

Pros: Gives us real values

Cons: Instrumentation effects results & Performance

156

Why use LLVM for this?

We can insert/inject code to monitor or change behavior of our code.

157 of 193

Adding in Functions (For Dynamic Analysis)

  • Typically this is done in an ad-hoc fashion
    • Either spreading in ‘printf’ functions everywhere
    • Lots of #define #endif
  • If we have our source code, we can inject code as needed.
    • No need to mess up or keep copies of various source versions.

  • Fair warning, I am running through these examples fast, but you have the slides
    • (Lots of source code on slides ahead--I am breaking powerpoint rules!)

157

www.mshah.io/fosdem18.html

158 of 193

Step 1:

Let’s write some code that we want to instrument

158

www.mshah.io/fosdem18.html

159 of 193

Step 1: Write a ‘hook’ or ‘profiling code’

Let’s write some code that we want to instrument

159

Here is a function ‘__initMain’ that will be inserted in our ‘main’ function and print a message

www.mshah.io/fosdem18.html

160 of 193

Step 1: Generate IR for hook

Now let’s create the intermediate representation of our code.

160

Donzo. Finished. IR is ready

www.mshah.io/fosdem18.html

161 of 193

Step 1: Generate IR for hook

Now let’s create the intermediate representation of our code.

161

Donzo. Finished. IR is ready

This is our function name. Note it “looks weird”. It is a mangled function name.

www.mshah.io/fosdem18.html

162 of 193

Step 2: Lets find the code we want to modify

How about our hello.cpp program. And we already have hello.ll from previous examples

162

This is the simplest program with one function

www.mshah.io/fosdem18.html

163 of 193

Now time for the Module pass

New headers needed: #include "llvm/IR/Module.h"

163

Why?

  1. To show you a module pass
  2. It makes a little more sense (to me) to search functions in a module I want to instrument.

www.mshah.io/fosdem18.html

164 of 193

The Module pass | Setup in 3 parts (in my code)

164

www.mshah.io/fosdem18.html

165 of 193

The Module pass

165

  1. Create a “stub” function

www.mshah.io/fosdem18.html

166 of 193

The Module pass

166

  • Notice it is using the ‘mangled’ c++ function name

www.mshah.io/fosdem18.html

167 of 193

The Module pass

167

2.) This next chunk of code iterates through a Module to look at all of the functions

www.mshah.io/fosdem18.html

168 of 193

The Module pass

168

3.) I am modifying code, so I return true for this pass

www.mshah.io/fosdem18.html

169 of 193

setupHooks()

This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.

169

www.mshah.io/fosdem18.html

170 of 193

setupHooks()

This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.

170

The observation from setupHooks() is that I am building up a ‘function’ that returns void and takes in one argument

www.mshah.io/fosdem18.html

171 of 193

setupHooks()

This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.

171

The observation from setupHooks() is that I am building up a ‘function’ that returns void and takes in one argument

Which is exactly the signature of __initMain

www.mshah.io/fosdem18.html

172 of 193

InstrumentEnterFunction

  • Same idea from InstrumentEnterFunction
  • I am building up a specific function to insert

172

www.mshah.io/fosdem18.html

173 of 193

InstrumentEnterFunction

  • Same idea from InstrumentEnterFunction
  • I am building up a specifc function to insert

173

Why not do something more simple?

With this approach, I can push different values as parameters based on whatever I need to do.

www.mshah.io/fosdem18.html

174 of 193

Steps to running function pass number 4!

Get our source code setup by running our pass in.

./../opt -load ./../../lib/LLVMHello.so -hello4 -S < hello.ll > readyToBeHooked.ll

Link in our instrumentation

./../llvm-link readyToBeHooked.ll instrumentation.ll -S -o instrumentDemo.ll

174

www.mshah.io/fosdem18.html

175 of 193

LLVM Tools - llvm-link

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

175

www.mshah.io/fosdem18.html

176 of 193

LLVM Tools - llvm-link

  • clang - Clang is the frontend C/C++ compiler (llvm is the backend)
    • Likely you have heard or used Clang even if you did not know it!
  • llvm-as - Takes LLVM IR in assembly form and converts it to bitcode format.
  • llvm-dis - Converts bitcode to text readable llvm assembly
  • llvm-link - Links two or more llvm bitcode files into one file.
  • lli - Directly executes programs bit-code using JIT
  • llc - Static compiler that takes llvm input (assembly or bitcode) and generates assembly code
  • opt - LLVM analyzer and optimizer which runs certain optimizations and analysis on files
  • More

176

Now that our files are merged, there is a declaration and a definition for our instrumentation!

www.mshah.io/fosdem18.html

177 of 193

LLVM-Link

  • Think of this like a ‘linker’ for IR code.
  • Sometimes it is useful to link all of your code together, and then run your optimizations
    • We call this “whole program optimization”

177

www.mshah.io/fosdem18.html

178 of 193

Grand Finale!

Run our linked .ll file (using lli or compile to source)

178

www.mshah.io/fosdem18.html

179 of 193

Grand Finale!

Run our linked .ll file (using lli or compile to source)

179

It works, we see our message before the “Bonjour” from hello.cpp!!

www.mshah.io/fosdem18.html

180 of 193

Going Further (Challenges/Project Ideas)

Time permitting:

  • Easy
    • Print out function arguments
    • Recover and print metadata and/or Profile Guided Optimization Data with functions
    • Write a python script that ‘llvm-links’ all of your .ll files together.
  • Medium
    • Build both a control flow graph and call graph and output to .dot
    • Find Program attributes
      • Add an attribute for any function < 10 instructions, and force it to inline
  • Hard/Interesting?
    • Autovectorizing (Find patterns and Insert SIMD instructions)
    • Investigate the “sanitzer” projects. See if you can add interesting printouts.

180

www.mshah.io/fosdem18.html

181 of 193

Resources

181

182 of 193

Resources

182

www.mshah.io/fosdem18.html

183 of 193

More Guidance - Your LLVM Syllabus

183

www.mshah.io/fosdem18.html

184 of 193

Contributing to LLVM

184

185 of 193

185

www.mshah.io/fosdem18.html

186 of 193

Conclusion

  • LLVM is an exciting project with a lot of power
  • LLVM or its related projects are likely the ‘right’ tool if you are working on programming languages, performance, or tool building
  • If you are still not convinced, your takeaway can still be to look at the codebase, and see some great engineering with the C++ language.
  • It’s big, but should not be scary
    • The difficulty that arises is that it is a lot of ‘new’ things
    • You can do it!

186

www.mshah.io/fosdem18.html

187 of 193

187

Feedback Form https://tinyurl.com/fosdem18llvmintro

(Whether you watched this talk now or in the future!)

188 of 193

Make sure we save output of opt

  • Something new we are doing with this pass, is that it actually is modifying code.
  • Occasionally you may see this message

  • In our case, yes we do want to output the modified bitcode file, but this time to a new bitcode file.

188

www.mshah.io/fosdem18.html

189 of 193

Some Gotcha’s

  • Having trouble with llvm-config?
    • Make sure your PATH variable is updated
    • export PATH=/home/mike/Desktop/llvm/llvm_build/bin/:$PATH

189

www.mshah.io/fosdem18.html

190 of 193

190

www.mshah.io/fosdem18.html

191 of 193

Useful debugging things

dump() command.

191

www.mshah.io/fosdem18.html

192 of 193

Build your own LLVM language

192

www.mshah.io/fosdem18.html

193 of 193

LLVM Backend information

193

www.mshah.io/fosdem18.html