Compile-Time Optimizations in Polly
Improving the compile-time of polyhedral compilation tools and Polly has been an issue for some time. However, recently, lot of progress has been made. The aim of this project is to improve the compile-time of Polly by a large-factor. Some of the prominent changes I plan to make are: changes to the representations in the Integer Set Library (ISL) that Polly uses, and to modify the position of Polly in LLVM pass pipeline and it’s interactions with other passes.
The general objective of my project is to reduce the compile-time of Polly. A comparison of LLVM with options -O3 vs. -O3 -polly shows that there is scope for improvement in various aspects. This could include general work like changes to the ISL AST generator, or targeted improvements like the following:
Criteria of Success
The following are some criteria of success I am aiming for:
Polly  is a loop-optimization pass in the LLVM compiler that relies on Polyhedral Compilation techniques  to generate code for multi-core and GPU architectures. Broadly, Polly performs optimizations like tiling and scheduling on for-loop sections of programs. The work of Polly has been very successful  because it relies on the strengths of open-source compilation, flexibility of LLVM-IR, as well as the formalism in the Polyhedral Compilation framework.
In spite of the large performance benefits that it gives, Polly is already reasonably fast on a variety of programs. On loop programs, the compile-times of Polly are around 200ms on small loop kernels and at most 1-2 seconds for larger multi-loop kernels. Also, on non-loop programs using Polly incurs an overhead of around 2-10% increase in compile-time. Though these are good numbers for scientific experiments, but, still significant compile-time reductions are possible. I aim to tackle these by making changes to the libraries that Polly uses and other targeted improvements.
Improvements to the Polly-ISL implementation
Polly relies heavily on the Integer Set Library (ISL)  at its heart for the various operations involving rational/integer polyhedra. For purposes of precision, ISL internally uses a variety of multi-precision libraries, which cause performance bottlenecks in a compiler setting like Polly.
In the winter of 2014/2015, I worked on using long long type instead of the GNU-MP (GMP) type that ISL usually relied on. This improvement fetched around 60% compile-time improvements over imath and around 34% over GMP and a positive feedback from the community .
In certains computations however, using long long integers is insufficient as it leads to overflow. The natural extension is to use int_128 integers and ensure that in ScopDetection we only handle integers for which we are sure the computations will not overflow. This would enable us to see the full performance benefits without limiting the optimizations performed.
The other directions is to implement a hybrid approach as has been previously discussed on the polly mailing list  . This involves keeping a struct of the following form:
long long int_long;
In the implementation which uses the above structure, all operations would be done on type long long by default. The code will perform range checks on each individual operations like isl_int_add, isl_int_sub, etc. It will fall back to imath when the long long/int_128 method is insufficient. This method would guarantee compile time improvements.
Latest performance results on gemm.c from PolyBench 4.0  benchmark show favourable results with around 50% improvement.
Polly (long long)
We also see a significant speedup of 50-60% compile time reduction for a large number of benchmarks.
Interactions of LLVM passes with Polly and their schedules
Polly is currently outside LLVM. Before working on its first analysis pass, SCoPDetection, Polly calls various LLVM optimizations like Tail Call Elimination, CFG Simplification, etc. Traditionally, these optimizations help in better polyhedral analysis, but, they cause an additional overhead into the Polly pass for non-loop programs. The following table summarizes these passes and their characteristics.
A key step to making Polly a standard optimization of LLVM is to find a location to call the Polly pass within the other existing passes in LLVM, such that it can use the previously computed analysis results that are still not invalidated. This step will avoid overhead.
However, the above process is not trivial. We need to consider the interactions of Polly with other LLVM passes. As an example, consider the interaction of the Loop Invariant Code Motion (LICM) and Polly; programs that have been subjected to LICM optimization are not readily amenable to optimizations in Polly. Hence we need to modify Polly so that it can optimize LICM'd code, before Polly can actually be moved back in the pass pipeline.
Pushing Polly further back in the pipeline helps us optimize C++ programs with templates where loops are only exposed after inlining. This scheduling of Polly within LLVM would improve the programs from libraries like boost::ublas.
I am 3rd year Computer Science undergraduate student from Indian Institute of Technology Hyderabad (IIT Hyderabad). My research areas include Compilers as well as Machine Learning. I started learning about compilers since a year and have gained a good amount of experience in this vast area by implementing various projects and studying the algorithms and implementations in LLVM.
As my first major compiler project I wrote a complete compiler for the COOL language including an x86 backend. I have also written an LLVM pass for Register Allocation which prints an interference graph and have a good understanding of LLVM code-base. I have studied LLVM source-code related to SSA construction, Interval analysis, Alias Analysis and also various papers about alias analysis, DJ graphs, loop optimizations and loop identification. I am also proficient in C/C++.
I do not have any other commitments for this summer.