Warning: This assignment can be a challenging one (a lot more than AS4 and AS5) and will be worth more in scores. Start early! And you can’t use any late days on this assignment (because it’s due on Dean’s date).
Implement superblock formation.
to this:
Then you can download the shared library here.
Note that you need to make to symbol link again after you run ‘make clean’.
There can be various detailed strategies on how to do a), but in this assignment we will use the algorithm covered in the lecture. Please follow this algorithm closely; otherwise it will be very hard to check the correctness of your code. You will need the profile information loaders to do this.
You are going to select traces only within innermost loops in this assignment. Below is the pseudocode for the trace selection algorithm. This has to be done for every innermost loop that exists in a module.
i = 0; mark all BBs unvisited while (there are unvisited nodes) do seed = unvisited BB with largest execution frequency trace[i].push_back(seed) mark seed visited current = seed // Grow trace forward while (true) do next = best_successor_of(current) if (next == NULL) then break traces[i].push_back(next) mark next visited current = next endwhile current = seed // set current back to seed // Grow trace backward while (true) do prev = best_predecessor_of(current) if (prev == NULL) then break traces[i].push_front(prev) mark prev visited current = prev endwhile i++; endwhile |
The pseudocode to select best successor/predecessor is as follows:
best_successor_of(BB) e = control flow edge with highest probability leaving BB if (e is a backedge) then return NULL endif if (probability < THRESHOLD) then return NULL endif d = destination of e if (d is visited) then return NULL endif if (d is outside of the loop) then return NULL endif return d endprocedure best_predecessor_of(BB) if (BB is the loop header) return NULL e = control flow edge with highest probability entering BB if (probability < THRESHOLD) then return NULL endif s = source of e if (s is visited) then return NULL if (s is outside of the loop) then return NULL endif return s endprocedure |
In this assignment, you are going to use Loop class, including methods inherited from its base class LoopBase, a lot. When selecting a BB with the highest probability, you should iterate BBs within a loop using Loop::block_iterator; if there is a tie, you should select the first one encountered when you are iterating BBs using LoopBase::block_begin/LoopBase::block_end. The below is the template code showing how you iterate BBs within a loop using Loop::block_iterator.
for (Loop::block_iterator bi = l->block_begin(), be = l->block_end(); bi != be; ++bi) { BasicBlock *bb = *bi; ... } |
When iterating predecessors and successors of a BB, you can use pred_iterator (or const_pred_iterator) and succ_iterator (or succ_const_iterator) with pred_begin/pred_end and succ_begin/succ_end functions. If there is a tie between predecessors/successors with the highest probability, you should select the first one encountered in the iteration as well. Below is an example of iterating predecessors and successors.
The branch bias threshold is configurable by the option ‘-superblock-branch-bias-threshold’ defined in Superblock.cpp and currently set to 0.7. You can use ‘biasThreshold’ to access this value. Don’t change or delete this option and the default value. And note that a basic block can belong to at most one trace, because when a BB is ‘marked’, it is not a candidate of next BB selection or successor/predecessor selection anymore.
When duplicating a BB, make the cloned BB’s name’s suffix ‘.clone’. So, ‘mybb’ will be cloned to ‘mybb.clone’. And every time you clone a BB, increment ‘numClonedBBs’ statistics variable defined in SuperblockFormation.cpp to collect statistics. And you are not allowed to create any other BBs other than cloned BBs, or delete any existing BBs, even if they are empty (= have only one TerminatorInst).
Here we are not going to impose any limit on the code size expansion, because it is nearly impossible to exceed the code size by 2x anyway in the current setting.
If BB2 has a definition of a variable ‘myvar’, its duplicate BB2’ has a definition of its duplicate variable ‘myvar.clone’. If BBX has a use of ‘myvar’, BBX should have a PHI node in the beginning and its use of ‘myvar’ should be converted to that PHI node. This is just one example; there can be many possible cases which can be tricky. Think about how to reconstruct a valid SSA form again and implement it. Describe how you solved the problem in your README.
If your program compiles successfully, there will be Debug+Asserts/lib/Superblock.so. Let’s first test with our proverbial all.fun.
Before applying superblock formation, we need to run BB/edge profilers we made in AS5 to generate CFG profiling data. We add ‘-mem2reg’ to see whether your PHI node handling is correct or not.
Now you have bb_info.prof, edge_info.prof, and prof_dump.txt.
-debug-only=superblock-formation will print debug messages enclosed in DEBUG macro only in your superblock formation pass. And we compile and run this superblock-formed program:
Your program output and the return value should be the same as the original program. all.fun does not take any input file, but if the program takes an input file, profiling input is usually smaller than the final testing input.
To compare the execution time, to be consistent, we optimize both the unmodified LLVM bitcode file and the superblock-formed file with -O3 and compare their execution time.
all.fun is too simple so there would be no difference in the execution time anyway. But actually, even if you run bigger programs with longer execution time, chances of seeing noticeable difference are small. Possible reasons are
So don’t worry if your program does not show any speedup; you will not be graded on your program’s speedup.
One way to check the correctness of your pass is to see what your new CFG looks like. opt has -view-cfg and -view-cfg-only options for this; these options show your CFGs in pictures. see Tips and Tricks section for the detailed instruction.
The other way is to see the output of profile information printer we used in AS5. We profiled the unoptimized bitcode file to generate profiling data for superblock formation above, but this time, this is to check if the superblock formation pass is running as we expected.
We specified other filenames for *.prof and prof_dump.txt files not to overwrite the files generated above. See the result of prof_dump.sb.txt and compare it to the original prof_dump.txt and check the difference makes sense.
Here replace BENCH_NAME with a real directory name. Don’t use parallel options like -j8 here, because several tests may clobber each other’s outputs. This will do all the steps listed above for you. prof_dump.sb.txt will be generated automatically as well.
(Note that this is only applicable if you use the LLVM version 3.5 and compile the code with the same options as in the provided Makefile)
Now you can add several more programs to benches/ directory to test. What you need to do is make a similar directory structure as one of the benchmarks, and edit benches/BENCH_NAME/exec_info file to specify command-line arguments, testing commands, and reference outputs. You can also edit benches/BENCH_NAME/compile_info if you need special cflags.
Tips and Tricks
Below is the list of miscellaneous tips and tricks. The listed order is not the order of importance.
Note that LoopInfo has been already declared as a required analysis in SuperblockFormation::getAnalysisUsage.
If you are using the forwarding routine to one of FC lab servers in .bashrc (it’s in AS1 page), add ‘-X’ option to the ssh command there too. If you are not using it, add ‘-X’ option to your second ssh command too.
To see if your X forwarding works, run xclock after you log in to one of FC lab servers. If a clock window appears, you are all set.
-view-cfg option displays the CFG with all the instructions, while -view-cfg-only does not show any instructions. -view-cfg-only is often easier to use because it is simpler. To run it:
This displays the CFG for each function within the .ll or .bc file one by one. If you want to view a specific function only or view the CFG in the middle of a pass execution, you can use Function::viewCFG or Function::viewCFGOnly function.
In this case, when selecting the best predecessor from bb2, you should select bb4. It exceeds the threshold(0.7), so it satisfies the requirement. Don't compute the edge weight of bb4->bb2 as 1.0 / (0.9 + 1.0). Just use EdgeProfileLoader::getWeight(edge).
Note that this is only applicable if you use the LLVM version 3.5 and compile the code with the same options as in the provided Makefile.
You can find all the LLVM related documentations in the LLVM 3.5 documentation page.
Available from: Friday, 24 April 2015
Due date: Tuesday, 12 May 2015, 5:00pm (Dean’s date)