Abstract (Last updated 2/01/18)
Abstract: In this talk, Michael Shah (“Mike”) will be presenting an introduction to the LLVM Compiler Infrastructure. A discussion of what LLVM is, who is using it, and why you might be interested in using LLVM will be presented during the first part of the talk. The second part of the talk will show interactive examples, taking us through installation to the point where we build and run our first function pass. We will build on top of our first function pass, to begin outputting some program metrics about programs. Mike will also be presenting some steps on how to proceed further and what resources are available for working with LLVM.
Materials:
Resources:
Contact: mshah.475@gmail.com
Twitter: @MichaelShah
1
www.mshah.io/fosdem18.html
Terminology (Open in a second browser if you like)
2
www.mshah.io/fosdem18.html
Introduction to LLVM
(Tutorial)
Mike Shah, Ph.D.
February 4, 2018
60-75 Minutes for talk (plenty of time for questions)
3
Demo Time! Right from the start!
4
www.mshah.io/fosdem18.html
Who Am I?�by Mike Shah
5
Who Am I?�by Mike Shah
6
Who Am I?�by Mike Shah
7
Who Am I?�by Mike Shah
8
This is an introduction to LLVM
We have some specific goals
9
www.mshah.io/fosdem18.html
Goals for Tomorrow
Because you’ll be ready to think about more solutions
10
www.mshah.io/fosdem18.html
Goals for Tomorrow
Because you’ll be ready to think about more solutions
11
www.mshah.io/fosdem18.html
Slides and code are at the following location
12
www.mshah.io/fosdem18.html
What is LLVM
13
LLVM (Formerly known as Low Level Virtual Machine--but it’s more!)
14
www.mshah.io/fosdem18.html
LLVM (Formerly known as Low Level Virtual Machine--but it’s more!)
15
What is it that makes LLVM so great that programmers are paying attention to it?
www.mshah.io/fosdem18.html
The Secret Recipe
16
What is it that makes LLVM so great that programmers are paying attention to it?
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
17
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
18
C++ Source
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
19
Lexers & parsers
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
20
Perform standard optimizations
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
21
Code generator
www.mshah.io/fosdem18.html
Chris Lattner’s big idea
22
Machine Code
1010101010101010
www.mshah.io/fosdem18.html
The big idea | Around the year 2000
23
www.mshah.io/fosdem18.html
The big idea | Around the year 2000
24
www.mshah.io/fosdem18.html
The Optimizer
25
Optimizer
www.mshah.io/fosdem18.html
The optimization stage of compilers
26
www.mshah.io/fosdem18.html
The optimization stage of compilers
27
Example of what IR instructions look like
source: https://llvm.org/docs/LangRef.html
www.mshah.io/fosdem18.html
How to get LLVM
28
How to get LLVM
(And all the tools)
29
How to get LLVM
30
I am actually going to run through this section very quick!
Use it as a reference for how to setup and run examples from this slide deck
How to get LLVM
31
The LLVM project evolves at a good pace.
That is why you will want to know how to build from source to get the latest changes.
Where the instructions always will be
32
www.mshah.io/fosdem18.html
Downloading LLVM 5.0
33
www.mshah.io/fosdem18.html
Create a directory on your desktop
34
www.mshah.io/fosdem18.html
Subdirectories
35
www.mshah.io/fosdem18.html
From a Terminal
36
www.mshah.io/fosdem18.html
From a Terminal
37
Now get lunch/dinner/breakfast depending on speed of your cpu.
www.mshah.io/fosdem18.html
How will we know it worked?
38
www.mshah.io/fosdem18.html
How to get LLVM
39
(Expect ~15-45 or more minutes to build from source depending on your cpu and internet connection)
Assumption: We all have a working LLVM at this point
Our first example | Emitting LLVMs intermediate form
40
www.mshah.io/fosdem18.html
Our first example | Emitting LLVMs intermediate form
41
www.mshah.io/fosdem18.html
Compile and run
42
www.mshah.io/fosdem18.html
Compile and run
Again, make sure you are using the correct version of clang++ that we built!
43
www.mshah.io/fosdem18.html
Now we can use clang++ to emit LLVM IR
44
�Our goal: Get an intermediate representation
Then we can talk more about this step:
www.mshah.io/fosdem18.html
Now we can use clang++ to emit LLVM IR
45
www.mshah.io/fosdem18.html
Now we can use clang++ to emit LLVM IR
(Use clang++ -help to see options)
46
www.mshah.io/fosdem18.html
Aside: Clang++, isn’t this an LLVM talk?
47
www.mshah.io/fosdem18.html
LLVM Tools
48
LLVM Tools - clang/clang++
49
www.mshah.io/fosdem18.html
What a second Mike!
50
So clang or perhaps other tools can work with this “LLVM”
Yes
No
What a second Mike!
51
So clang or perhaps other tools can work with this “LLVM”
Yes
No
Modularity
52
sources: AOSA Book
www.mshah.io/fosdem18.html
Modularity
53
sources: AOSA Book
Okay, now let us take a closer look at that IR
www.mshah.io/fosdem18.html
[Pop Quiz] What does this function do?
54
[Pop Quiz] What does this function do?
55
Guesses from the audience?
[Pop Quiz] What does this function do?
56
Well it is named “add1”
[Pop Quiz] What does this function do?
57
There are 2 i32 arguments
[Pop Quiz] What does this function do?
58
i32 = int
[Pop Quiz] What does this function do?
59
Every function has a starting point
[Pop Quiz] What does this function do?
60
We store a result of an ‘add’ operation
[Pop Quiz] What does this function do?
61
Then return the result as an int
[Pop Quiz] What does this function do?
62
If you can read assembly (or even C!) you can understand LLVM �Intermediate Representation
LLVM’s Secret Sauce
63
LLVM IR
64
sources: AOSA Book
www.mshah.io/fosdem18.html
(Quick Aside: SSA example from wikipedia)
65
Not SSA
Uses SSA
www.mshah.io/fosdem18.html
(Quick Aside: SSA example from wikipedia)
66
Not SSA
Uses SSA
Quickly notice we can eliminate an extra variable
www.mshah.io/fosdem18.html
(Again, more examples from AOSA book from Lattner himself)
67
www.mshah.io/fosdem18.html
Using Clang++ and Generating IR
68
Example 1 | hello.cpp
(Note ubuntu users: if the above failed, try adding -fno-use-cxa-atexit link)
69
www.mshah.io/fosdem18.html
And here it is:
70
www.mshah.io/fosdem18.html
Pause -- Really take a second to look at the IR
What jumps out at you in this snippet?
71
Audience, what stands out?
www.mshah.io/fosdem18.html
My Findings
72
www.mshah.io/fosdem18.html
Targeting different backends
73
Looks like good information to have for this stage (which we will not get to today)
www.mshah.io/fosdem18.html
Targeting different backends
74
Are you enjoying the readability of IR yet?
Good news, machines like IR too
www.mshah.io/fosdem18.html
LLVM Tools - lli
75
www.mshah.io/fosdem18.html
The IR is very assembly like -- very readable!
76
www.mshah.io/fosdem18.html
The IR is very assembly like -- very readable!
77
IR has a binary form called bitcode (.bc).
Binary data will be more compact and thus to run through a JIT!
www.mshah.io/fosdem18.html
LLVM Tools - llvm-as
78
www.mshah.io/fosdem18.html
Let’s convert .ll to a .bc file | llvm-as
The llvm assembler converts the textual (or readable) IR to bitcode and now we have “hello.bc”.
79
www.mshah.io/fosdem18.html
Same result, as expected!
80
www.mshah.io/fosdem18.html
lli executes bitcode (binary format of IR)
My claim is the JIT engine can execute more efficiently (Why?).
81
www.mshah.io/fosdem18.html
lli executes bitcode (binary format of IR)
My claim is the JIT engine can execute more efficiently (Why?).
^binary representation of the textual .ll format we previously saw. A little more compressed, smaller file size.
82
www.mshah.io/fosdem18.html
lli executes bitcode (binary format of IR)
My claim is the JIT engine can execute more efficiently (Why?).
^binary representation of the textual .ll format we previously saw. A little more compressed, smaller file size.
83
Eventually we may want the assembly for our target machine to build an executable
www.mshah.io/fosdem18.html
LLVM Tools - llc
84
www.mshah.io/fosdem18.html
The full circle -- compile our IR to assembly (.s file)
Run llc on our .bc file which creates an assembly file (hello.s)
85
www.mshah.io/fosdem18.html
The full circle -- compile our IR to assembly (.s file)
Run llc on our .bc file which creates an assembly file (hello.s)
86
hello.s
www.mshah.io/fosdem18.html
The full circle -- compile our IR to assembly (.s file)
A wide variety of targets are available for you to generate assembly code.
87
www.mshah.io/fosdem18.html
The full circle -- compile our IR to assembly (.s file)
A wide variety of targets are available for you to generate assembly code.
88
At this point in the talk, we have played with IR and gotten familiar with some tools.
We have not utilized the optimizer, (i.e. Lattner’s big idea)
www.mshah.io/fosdem18.html
LLVM Tools - opt
89
www.mshah.io/fosdem18.html
Lets run opt | ./../opt hello.ll --time-passes
90
www.mshah.io/fosdem18.html
Passes with ‘opt’
91
www.mshah.io/fosdem18.html
Passes with ‘opt’
92
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
93
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
94
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
95
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
96
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
97
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
98
www.mshah.io/fosdem18.html
Different Types of Passes in LLVM
99
Our next task:
Learn how to analyze IR with passes. This can lead toward paths of:
www.mshah.io/fosdem18.html
Goal - Print all of the Functions in a program
100
www.mshah.io/fosdem18.html
Goal - Print all of the Functions in a program
101
Guesses from the audience?
www.mshah.io/fosdem18.html
Goal - Print all of the Functions in a program
102
www.mshah.io/fosdem18.html
Goal - Print all of the Functions in a program
103
Maybe I would accept other answers as well, but “Function Pass” is the easiest route
www.mshah.io/fosdem18.html
Writing Our First Function Pass
104
We will be working in: llvm/lib/Transforms/Hello/Hello.cpp
105
www.mshah.io/fosdem18.html
(A visual if anyone setup Codeblocks)
This is given to you when you download LLVM (You can learn how to add more passes here)
106
www.mshah.io/fosdem18.html
107
Okay, here is hello.cpp
It is a FunctionPass
www.mshah.io/fosdem18.html
108
(This code is included with LLVM)
www.mshah.io/fosdem18.html
www.mshah.io/fosdem18.html
The piece we care about for now
www.mshah.io/fosdem18.html
Building our hello pass
111
www.mshah.io/fosdem18.html
Our pass is then compiled in build/lib/ as LLVMHello.so
112
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
113
opt tool which we have used before
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
114
We load the library which contains our passes
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
115
Path to our LLVMHello pass library
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
116
The particular function pass we want to run
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
117
Our input file (.bc or .ll file)
www.mshah.io/fosdem18.html
Run our first pass with opt on hello.bc
118
www.mshah.io/fosdem18.html
Anatomy of a “Pass”
119
piece of code that does the work
www.mshah.io/fosdem18.html
We are not ‘mutating code’ so return false.
www.mshah.io/fosdem18.html
Inherit from the ‘FunctionPass’ class
www.mshah.io/fosdem18.html
Register the pass. This is how the pass is built
www.mshah.io/fosdem18.html
i.e. how I knew what to type in the comand line in our example
www.mshah.io/fosdem18.html
125
Congratulations on writing/running your first pass
LLVM is properly configured, on to more analysis
www.mshah.io/fosdem18.html
Static Analysis
Goal of Static Analysis: What information/bugs/performance errors can we uncover before we run the program.
Pros: Gives us full coverage of program �Cons: No real runtime data, overly conservative
126
Our Second pass -- This time we collect some program stats
127
www.mshah.io/fosdem18.html
Our Second pass -- This time we collect some program stats
128
www.mshah.io/fosdem18.html
Compile and Test loops.cpp and use loops.ll on -hello pass
129
www.mshah.io/fosdem18.html
The Stats Pass source code
130
Okay, here is our second pass
It is a FunctionPass that collects stats
www.mshah.io/fosdem18.html
The Stats Pass source code
131
Here is where we will accumulate the basic blocks and instructions within our function
www.mshah.io/fosdem18.html
The Stats Pass source code
132
Here notice, that within a function, we can iterate through its basic blocks, and every instruction within each basic block
www.mshah.io/fosdem18.html
The Stats Pass source code
133
And finally we output this information
www.mshah.io/fosdem18.html
(Don’t forget to save, and rebuild our pass)
134
www.mshah.io/fosdem18.html
Results of pass 2 (with loops.ll)
135
www.mshah.io/fosdem18.html
Results of pass 2 (with loops.ll)
136
Same library, but different pass that’s it!
www.mshah.io/fosdem18.html
Results of pass 2 (with loops.ll)
137
Observe here, same pass runs on every function. There is no “memory” here of previous runs. Need a data structure, analysis pass, or perhaps “module pass”
www.mshah.io/fosdem18.html
Results of pass 2 (with loops.ll)
138
www.mshah.io/fosdem18.html
139
Here’s homework for later!
I’m not pulling these ideas from nowhere!
www.mshah.io/fosdem18.html
140
Okay, here is our third pass
It is a FunctionPass that shows direct function calls
www.mshah.io/fosdem18.html
141
www.mshah.io/fosdem18.html
Find Direct Calls
Added new header: #include "llvm/IR/CallSite.h"
142
www.mshah.io/fosdem18.html
Find Direct Calls
Added new header: #include "llvm/IR/CallSite.h"
143
A callsite ??
www.mshah.io/fosdem18.html
LLVM Docs
144
www.mshah.io/fosdem18.html
LLVM Docs
145
www.mshah.io/fosdem18.html
(Pssst! You have the source code as well)
Here is a sample grep
146
www.mshah.io/fosdem18.html
(continued) Find Direct Calls
Added new header: #include "llvm/IR/CallSite.h"
147
If our instruction is not a ‘callable’ (i.e. a function)
www.mshah.io/fosdem18.html
(continued) Find Direct Calls
Added new header: #include "llvm/IR/CallSite.h"
148
Find out if our ‘callee’ is a direct function call (not a function pointer or anything)
www.mshah.io/fosdem18.html
The Result!
149
www.mshah.io/fosdem18.html
Bonus Trick: Outputting graphs
150
LLVM actually provides a pass that can output control flow graphs
151
www.mshah.io/fosdem18.html
Here is the ‘countdown function’ from loops.pp
152
www.mshah.io/fosdem18.html
Here is the ‘countdown function’ from loops.pp
153
www.mshah.io/fosdem18.html
Here is the ‘countdown function’ from loops.pp
154
www.mshah.io/fosdem18.html
Dynamic Analysis
Goal of Dynamic Analysis: What information/bugs/performance errors can we uncover when we run the program.
Pros: Gives us real values
Cons: Instrumentation effects results & Performance
155
Dynamic Analysis
Goal of Dynamic Analysis: What information/bugs/performance errors can we uncover when we run the program.
Pros: Gives us real values
Cons: Instrumentation effects results & Performance
156
Why use LLVM for this?
We can insert/inject code to monitor or change behavior of our code.
Adding in Functions (For Dynamic Analysis)
157
www.mshah.io/fosdem18.html
Step 1:
Let’s write some code that we want to instrument
158
www.mshah.io/fosdem18.html
Step 1: Write a ‘hook’ or ‘profiling code’
Let’s write some code that we want to instrument
159
Here is a function ‘__initMain’ that will be inserted in our ‘main’ function and print a message
www.mshah.io/fosdem18.html
Step 1: Generate IR for hook
Now let’s create the intermediate representation of our code.
160
Donzo. Finished. IR is ready
www.mshah.io/fosdem18.html
Step 1: Generate IR for hook
Now let’s create the intermediate representation of our code.
161
Donzo. Finished. IR is ready
This is our function name. Note it “looks weird”. It is a mangled function name.
www.mshah.io/fosdem18.html
Step 2: Lets find the code we want to modify
How about our hello.cpp program. And we already have hello.ll from previous examples
162
This is the simplest program with one function
www.mshah.io/fosdem18.html
Now time for the Module pass
New headers needed: #include "llvm/IR/Module.h"
163
Why?
www.mshah.io/fosdem18.html
The Module pass | Setup in 3 parts (in my code)
164
www.mshah.io/fosdem18.html
The Module pass
165
www.mshah.io/fosdem18.html
The Module pass
166
www.mshah.io/fosdem18.html
The Module pass
167
2.) This next chunk of code iterates through a Module to look at all of the functions
www.mshah.io/fosdem18.html
The Module pass
168
3.) I am modifying code, so I return true for this pass
www.mshah.io/fosdem18.html
setupHooks()
This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.
169
www.mshah.io/fosdem18.html
setupHooks()
This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.
170
The observation from setupHooks() is that I am building up a ‘function’ that returns void and takes in one argument
www.mshah.io/fosdem18.html
setupHooks()
This code creates “a placeholder” for our source program. I do not link in my instrumentation code until the very end.
171
The observation from setupHooks() is that I am building up a ‘function’ that returns void and takes in one argument
Which is exactly the signature of __initMain
www.mshah.io/fosdem18.html
InstrumentEnterFunction
172
www.mshah.io/fosdem18.html
InstrumentEnterFunction
173
Why not do something more simple?
With this approach, I can push different values as parameters based on whatever I need to do.
www.mshah.io/fosdem18.html
Steps to running function pass number 4!
Get our source code setup by running our pass in.
./../opt -load ./../../lib/LLVMHello.so -hello4 -S < hello.ll > readyToBeHooked.ll
Link in our instrumentation
./../llvm-link readyToBeHooked.ll instrumentation.ll -S -o instrumentDemo.ll
174
www.mshah.io/fosdem18.html
LLVM Tools - llvm-link
175
www.mshah.io/fosdem18.html
LLVM Tools - llvm-link
176
Now that our files are merged, there is a declaration and a definition for our instrumentation!
www.mshah.io/fosdem18.html
LLVM-Link
177
www.mshah.io/fosdem18.html
Grand Finale!
Run our linked .ll file (using lli or compile to source)
178
www.mshah.io/fosdem18.html
Grand Finale!
Run our linked .ll file (using lli or compile to source)
179
It works, we see our message before the “Bonjour” from hello.cpp!!
www.mshah.io/fosdem18.html
Going Further (Challenges/Project Ideas)
Time permitting:
180
www.mshah.io/fosdem18.html
Resources
181
Resources
182
www.mshah.io/fosdem18.html
More Guidance - Your LLVM Syllabus
183
www.mshah.io/fosdem18.html
Contributing to LLVM
184
185
www.mshah.io/fosdem18.html
Conclusion
186
www.mshah.io/fosdem18.html
Thank You!
187
Feedback Form https://tinyurl.com/fosdem18llvmintro
(Whether you watched this talk now or in the future!)
Make sure we save output of opt
188
www.mshah.io/fosdem18.html
Some Gotcha’s
189
www.mshah.io/fosdem18.html
Courses Using LLVM
https://www.cs.utexas.edu/users/lin/cs380c/prog1.pdf
Tour of LLVM Project
https://blog.regehr.org/archives/1453 | http://www.linux.org/threads/llvm-toolset.6644/
190
www.mshah.io/fosdem18.html
Useful debugging things
dump() command.
191
www.mshah.io/fosdem18.html
Build your own LLVM language
192
www.mshah.io/fosdem18.html
LLVM Backend information
193
www.mshah.io/fosdem18.html