COS320, Spring 2015: Compiling Techniques

Basic Block & CFG Edge Profiler

Implement a basic block profiler and an edge profiler, and profile information loader for each of them.

Profiling and Instrumentation

Profiling is a form of dynamic program analysis that measures dynamic behaviors of a program. Dynamic analysis is the testing and evaluation of an application during runtime, while static analysis is the testing and evaluation of an application by examining the code without executing the application. Counting the number of instructions existing in a compiled binary file is one example of static analysis, while counting the number of executed instructions when you actually run the program belongs to the category of dynamic analysis.

Code instrumentation is one way of doing profiling. Programmers can insert additional auxiliary instructions that do certain tasks when they are being executed. For example, if you want to count the number of ‘add’ instructions executed at runtime, you can insert an auxiliary instruction that increments the ‘add’ counter after each ‘add’ instruction existing in the program. Then the counter will be incremented whenever an ‘add’ instruction is executed.

LLVM Passes

In all the previous assignments, source files in one of subdirectories in tools/ are linked together with other library source files in various subdirectories of lib/ to produce an executable file (lexer, parser, tchecker, and codegen). From this assignment, you will implement an LLVM Pass, which is basically a shared library file that can be loaded at runtime from opt tool. opt takes either the LLVM assembly language format (.ll) or the LLVM bitcode format (.bc) as input and the LLVM bitcode format (.bc) as output. (.ll file can be the output format too if you use -S option) opt transforms the LLVM bitcode by applying several LLVM passes to a given input.

Some passes perform the transformations and optimizations that make up the compiler, and others do not modify the program itself but build the analysis results that are used by other transformations. An LLVM pass will be compiled as a shared library (.so), which can be loaded from opt if you specify the corresponding command-line argument.

Let’s use opt using a simple program. This is the example program we saw in the Assignment 4:

#include <stdio.h>

int main() {

int i = 0, sum = 0;

for (i = 0; i <= 10; i++)

sum += i;

printf("%d\n", sum);

return 0;

}

Compile this program to LLVM IR using this command:

$ clang -O0 -emit-llvm -S sum.c

This will produce unoptimized sum.ll file. All local variables are stored on the stack even if their addresses are not taken. Let’s apply -mem2reg pass, which promotes memory references to be register references.

$ opt -mem2reg -S -o sum_mem2reg.ll sum.ll

opt can take either .ll or .bc file as an input. Here the output file is also in .ll format, but if we don’t use -S option, the output will be the bitcode format (.bc). These two formats are semantically equivalent, so you can think of .ll file as a readable representation of .bc file. llvm-dis and llvm-as can convert .bc to .ll and vice versa respectively.

Anyway, let’s compare the original unoptimized sum.ll and the sum_mem2reg.ll file we just created. You will notice all those local variables located on the stack using alloca instructions are now in registers, which can be much faster.

The execution order of passes basically follows their specified order in the command line arguments. So, if we give options as follows:

$ opt -somepass -someotherpass ...

-somepass will be applied before -someotherpass. But the whole order of execution of passes can be changed by other factors too and is ultimately controlled by the LLVM pass manager. On the other hand, you can use some predefined options such as -O0 ~ -O3; they are not pass names themselves. If you use these, up to hundreds of predefined set of passes will be applied based on the specified options. The list of some of passes included in LLVM is here (Note that it is not a complete list of passes).

-mem2reg and many other passes are already included in LLVM’s standard pass set, so you can directly use the their command-line options to opt. But you need to use -load options before the option name to load a specific .so file to use non-standard pass you create. For example, if you create a pass named -mypass which is in MyDir/MyPass.so opt, the options will be:

$ opt -load MyDir/MyPass.so -mypass ...

Now it is time to learn how to write your own pass. Read this Writing an LLVM pass document; understanding this is crucial for this assignment. You don’t need to know about analysis groups in detail for now though. ‘hello’ example pass in the tutorial is in $LLVM_SRC_ROOT/lib/Transforms/Hello for you to see what it’s like, although it is not included in the standard set of LLVM passes.

Implement a Basic Block / CFG Edge Profiler and Profiler Information Loader

Download the archived file for as5 and extract it to the project directory.

cd $COS320_SRC_ROOT
wget http://www.cs.princeton.edu/courses/archive/spring15/cos320/homeworks/as5/as5.tar.gz
tar xfvz as5.tar.gz
rm as5.tar.gz

From this assignment, you are not going to make a new tool (=executable), so you don’t need to copy or symlink fun.l and fun.y anymore.

Files relevant to this assignment are in lib/Profiler/ and include/Profile/. BBProfileLoader.h and EdgeProfileLoader.h are in include/Profile/ directory because they will be used from another module, which you will implement in the next assignment. You will modify BBProfiler, EdgeProfiler, BBProfileLoader, and EdgeProfileLoader classes in this assignment. Don’t modify ProfilePrinter class; it is used to test the functionalities of other classes.

BBProfiler, a basic block profiler, is for getting dynamic execution counts for each basic block in a program. To do this, you need to create a global counter array that can contain execution counts of every existing basic block, and insert an instruction to every basic block that increments the corresponding counter for that block. EdgeProfiler, a CFG edge profiler, is for getting dynamic execution counts for every CFG edge in a program. Similarly, you also need a global array to store execution counts for every CFG edge. Make sure the names of global counter arrays in two profilers are different, because otherwise you will not be able to use two passes together. (= Counter-incrementing instructions from the two profilers end up incrementing the same values)

Adding counter-incrementing instructions in EdgeProfiler is trickier than the case of BBProfiler, because CFG edges are not something that actually exist in LLVM bitcode. In a function, if you can go from basic block BB1 to basic block BB2 in a program execution (either by a branch instruction or a fall-through), it is considered that there is an edge from BB1 to BB2. But a counter-incrementing instruction should belong to some basic block, because it is an instruction itself.

We will present you an algorithm to do this here. But to make this algorithm work, there should be no critical edges in the program. A critical edge is an edge which is neither the only edge leaving its source block, nor the only edge entering its destination block.

These edges must be split: a new block must be created in the middle of the edge, in order to insert computations on the edge without affecting any other edges. Even though this is a simple process, fortunately, this is not what you have to do in this assignment; -break-crit-edges pass in the LLVM will do that for you. As long as we use our passes after -break-crit-edges, we can assume there are no critical edges in an input program.

Assuming that there are no critical edges left, the algorithm can be as follows:

for each basic block BB

for each unique successor SUCC of BB

EDGE = CFG edge from BB to SUCC

if (SUCC is the only successor of BB)

insert a counter-incrementing inst for EDGE at the end of BB

(but before a terminator inst)

else

insert a counter-incrementing inst for EDGE at the start of

SUCC (but after all PHI insts)

Think about why this works for the edge profiler and why you need critical edges to be split to make this work, and include your reasoning in your README.

BBProfiler class is defined in lib/Profile/BBProfiler.cpp and lib/Profile/BBProfiler.h, and EdgeProfiler in lib/Profile/EdgeProfiler.cpp and lib/Profile/EdgeProfiler.h. They are defined as ModulePasses, which means you can examine the whole program as a total. If a program is distributed in several compilation units (ex. several C source files), you will need to combine them before you apply any ModulePass, which we will describe later.

What you basically have to do is to implement BBProfiler::runOnModule and EdgeProfiler::runOnModule methods. Of course you can add any additional member/non-member functions or variables as long as you implement the given functionalities correctly. You can modify BBProfiler.cpp/BBProfiler.h and EdgeProfiler.cpp/EdgeProfiler.h as much as you wish to implement them, but don’t delete predefined command-line options or methods; deleting them can result in compilation or autograder failures.

If you want to separate files to add some utility/helper functions required from both classes, lib/Profile/ProfileUtils.cpp and lib/Profile/ProfileUtils.h are places for that. In case you add functions to these files, you should submit these to the dropbox too. Please don’t make any more additional files; you can’t submit those other files.

For BBProfiler, the filename to store profile information can be configured using -bb-info-output-file option and is set to “bb_info.prof” by default (Refer to command-line option definitions at the top of BBProfiler.cpp). For EdgeProfiler, the option is -edge-info-output-file and “edge_info” is the default. As defined in the options, you can refer to these names as ‘outputFileName’ in your code, i.e., you can use that like a string. When the program ends, you should write all the BB/edge counter information to the output file, which means you need to insert all those code to all program exit points. (You can assume no function in a module will call ‘main’ function.) How you store this information is up to you, because you will also implement the profile information loader yourself (BBProfileLoader / EdgeProfileLoader). You can write information either as plain text or as non-human-readable binary. You can store information in any format and in any order you want. As long as you can store correct information and load the same information in your profile loaders, it is fine. These *.prof files will be generated when you run instrumented programs.

One more thing you need to do is you need to set ‘numBBs’ in BBProfiler and ‘numEdges’ in EdgeProfiler to correct values. Refer to their STATISTIC(...) definitions in cpp files. They are macros to collect statistics for a pass; this section in the Programmer’s Manual explains what they are and why we need them. You can treat both of them as integers and assign values to them or increment their values freely.

After you finish implementing BBProfiler and EdgeProfiler, it is time to implement loaders that load the profile results from your program. You need to implement runOnModule methods in BBProfileLoader and EdgeProfileLoader classes, whose source files are in lib/Profile/ and headers in include/Profile/. As above, you can add any member/non-member variables or functions to the classes as long as you don’t delete predefined things or modify the interface. Your loader should read information from *.prof file (here the option is ‘profileFileName’. Refer to cpp files) generated by instrumented programs and save that within your class instance as member variables. You also can access the llvm IR because these loaders are also ModulePasses. You can assume that the given llvm bitcode as input is the same as the one given to your BBProfiler/EdgeProfiler, which means that the input programs have CFG and no critical edges.

You should implement BBProfileLoader::getCount, EdgeProfileLoader::getCount, EdgeProfileLoader::getWeight functions using your loaded information. getCount function returns the execution count for the given basic block or CFG edge, and getWeight function returns the weight for the given edge. If an edge goes from BB1 to BB2, the weight of a single edge is computed as:

Edge BB1->BB2’s weight = execution count for the edge BB1->BB2 / sum of execution counts for all edges from BB1

For example, if BB1 has three successors BB2, BB3, and BB4, and BB1->BB2 is executed 10 times, BB1->BB3 5 times, BB1->BB4 5 times, the weight of BB1->BB2 is 10 / (10+5+5) = 0.5. The possible range of a weight is [0, 1] (includes both 0 and 1).

These functions will be the interface between the loaders and the profile information printer, the program that gets information from BB/edge profile loaders you implement and prints the information to a text file. The profile information printer is implemented in lib/Profile/ProfilePrinter.cpp/h, and these are not the files you need to modify. One thing to note is you should not modify the given bitcode files in your loaders.

As in BB/Edge profilers, you need to set ‘numBBs’ and ‘numEdges’ variables in BBProfileLoader and EdgeProfileLoaders classes too.

As usual, you can build your program using ‘make’. Make sure you remove all the warnings and you don’t have any testing printf statements uncommented left in the code when you submit.

cd $COS320_SRC_ROOT
make

If your program compiles successfully, there will be Debug+Asserts/lib/Profile.so. Now you can use your library to instrument any LLVM bitcode file (.bc) or LLVM assembly file (.ll), as long as they contain all the functions of the program. Let’s first test with our proverbial all.fun.

./Debug+Asserts/bin/codegen tests/all.fun

Before applying our instrumentation passes, we need to apply -break-crit-edges first to remove all critical edges. Here we name critical-edge-removed file ‘bench.ll’.

opt -break-crit-edges -mem2reg -S -o bench.ll all.ll

opt -load Debug+Asserts/lib/Profile.so -bb-profiler -edge-profiler -stats -debug-only=cfg-profiler -S -o bench.prof.ll bench.ll

This creates an instrumented LLVM assembly code bench.prof.ll. -stats does not affect the execution of the pass, but shows you execution stats and you can see if your stats counting is working correctly. -debug-only=cfg-profiler will print debug messages enclosed in DEBUG macro only in your profiler/loader passes. Here we used -bb-profiler and -edge-profiler together so your instrumented code will contain instrumentations for both of them. To enable this, please make sure the names of global counter arrays in two profilers are different. You can test each of them separately when you are debugging, but your passes should work together in this way too.

You can compile and run this program:

clang -o bench.prof.exe bench.prof.ll
./bench.prof.exe

If your program runs correctly, you will notice there is bb_info.prof and edge_info.prof generated in your current directory. If you want to use file names other than bb_info.prof/edge_info.prof, you can give that as command-line options (refer to options in .cpp files). You can also specify other output .ll filename using -o option.

You can even prepend -break-crit-edges to this option, but we don’t recommend it because explicitly generating critical-edge-removed .ll files will be much convenient for your debugging.

Now you run your profile information loaders with the profile information printer.

opt -load Debug+Asserts/lib/Profile.so -bb-profile-loader -edge-profile-loader -profile-printer -dump-bb-info -dump-edge-info -stats -disable-output -debug-only=cfg-profiler bench.ll

This will first run your BBProfileLoader and EdgeProfileLoader to read the information in bb_info.prof and edge_info.prof, and run ProfilePrinter to print that information to a file named prof_dump.txt. -bb-profile-loader and -edge-profile-loader options can actually be omitted here because ProfilePrinter pass ‘requires’ both of them to be run before it; see ProfilePrinter::getAnalysis method for why it works. This section provides a background explanation. And we have disabled the output because we are not modifying a program at all in all these loaders and printers.

The filename ‘prof_dump.txt’ can also be changed by a command-line option defined in ProfilePrinter.cpp.

‘prof_dump.txt’ contains a human-readable representation of the information loaded by your basic block/edge profile information loaders. Check if the information written in prof_dump.txt is correct.

Although we tested with all.fun just before, now we are not limited to FUN files for testing, because what opt works on is not FUN files but .ll or .bc files. You can compile any C files or even files written in other languages (if LLVM supports that frontend) to LLVM bitcode format, and use that to test your passes. One thing to note is that if the program code is distributed among multiple compilation units (=source files), you need to assemble them into one before testing your passes using llvm-link, because your passes are ModulePasses. What llvm-link does is just assemble .ll files into one; don’t confuse with linkers like gnu-ld.

But as you have noticed, all these commands are very tedious to enter manually every time. We provide you with three benchmarks: sum (simple program that sums the numbers from 1 to N), and wc, and yacc.

cd $COS320_SRC_ROOT/benches/BENCH_NAME/src
make

Here replace BENCH_NAME with a real directory name. This will do all the steps listed above for you - This will merge source files into one, break critical edges, instrument the program using your profilers, execute the instrumented program and check if the result is correct, and then run the profile information printer to generate ‘prof_dump.txt’. You can do ‘make clean’ to clean all the generated files.

Now you can even add several more programs to benches/ directory to test. What you need to do is make a similar directory structure as sum/ or wc/, and make benches/BENCH_NAME/exec_info file containing the command-line arguments and reference output.

Submit all of these files to dropbox here:

README
BBProfiler.cpp / BBProfiler.h
EdgeProfiler.cpp / EdgeProfiler.h
BBProfileLoader.cpp / BBProfileLoader.h
EdgeProfileLoader.cpp / EdgeProfileLoader.h

It is totally OK if you have not modified some header files among these. But please submit all these files anyway.

And in case you modified ProfileUtils.cpp and ProfileUtils.h, you should submit these too.

ProfileUtils.cpp / ProfileUtils.h

Tips and Tricks

Below is the list of miscellaneous tips and tricks. The listed order is not the order of importance.

From this assignment, it is not very advisable to use IRBuilder class anymore. IRBuilder is convenient when you generate llvm code from scratch as in the previous assignment, but not very useful when you analyze and modify an existing llvm code. When you insert instrumentation code in an existing program, you will iterate through existing functions, basic blocks, and instructions and insert instructions when you need to. LLVM Programmer’s Manual, and especially Helpful Hints for Common Operations part in it can be useful introduction on how to do these. To create an instruction, you can either create one using ‘new’ and insert it to a basic block as in the Helpful Hints for Common Operations page above, or some classes provide helpful static methods to do this, like CallInst::Create.

Ironically, generating instructions that write information to *.prof will involve more work than actual counting itself. To write information to *.prof, you can use any functions provided in the standard C library. While you can use C++ functions too, I don’t recommend them because generating corresponding LLVM code will be much more complicated.

To create a global counter array, GlobalVariable class can be useful.

LLVM’s backend static compiler llc supports various architectures (x86, MIPS, …) to generate assembly code. One of the architecture supported is ‘cpp’, which means you can generate a .cpp file from an .ll file. The generated cpp files contains LLVM IR operation required to generate the .ll file, which can be helpful to figure out which functions or classes to use.

llc -march=cpp test.ll

Here are the reference prof_dump.txt outputs for the given benchmarks:

Note that this is only applicable if you use the LLVM version 3.5 and compile the code with the same options as in the provided Makefile.

Sometimes it is helpful to explore LLVM source code to get some information. (For example, you might want to know how a certain function is being used) You can do that using Doxygen generated documentation. But you can also directly browse the source code in $LLVM_SRC_ROOT/. ctags can be convenient to browse the code.

References

You can find all the related documentations in the LLVM 3.5 documentation page.

Available from: Tuesday, 2 April 2015