1 of 43

The Impact of Compiler-Based Performance Optimizations on Security

7 February 2020

Michael D. Brown

Software Assurance Associate Branch Head

Research Scientist, GTRI-CIPHER

1

2 of 43

Introduction

Modern compilers are focused on two goals when converting source code into executable binaries.

Correctness: Preserving the functional semantics of the source code in the binary.
Performance: Producing an optimal binary with respect to metrics like execution speed, code size, memory footprint, etc.

Security is typically not a concern. This has led to a number of security weaknesses that are introduced by compiler optimizations.

In this talk, we will explore recent research on security concerns related to optimizing compilers.

2

3 of 43

Agenda

Background: Compiler Performance Optimizations
Problem: Security Weaknesses Introduced by Optimizations
Study: Impact of Optimizations on Code Reuse Attack Susceptibility

Do compiler optimizations improve the quantity / quality of code reuse gadgets? If so, why?

Mitigation: Potential Solutions and Mitigations

3

4 of 43

What are Compiler Performance Optimizations?

Modern compilers are tasked with producing executable binaries from source code that are both functionally correct and performant.

Source code is first converted into an Intermediate Representation (IR) (similar to Assembly language).

While in IR, software analysis and software transformation techniques are used to alter the code to improve performance without altering functional semantics.

Optimized code is then used to generate a machine executable binary.

Architecture of an Optimizing Compiler [1]

4

5 of 43

Example: Constant Propagation (and Dead Code Elimination)

Optimizes code by carrying forward constant values or performing static computations at compile time, simplifying the program.

Simplifications may enable later optimizations, such as Dead Code Elimination.

int foo(int z){

int a = 1, b = 2;

int c = 1 + b;

int d = 100;

if(c > 0){

d = d - z;

} else{

d = a + z;

}

return d;

}

int foo(int z){

int a = 1, b = 2;

int c = 1 + 2;

int d = 100;

if(c > 0){

d = 100 - z;

} else{

d = 1 + z;

}

return d;

}

int foo(int z){

int a = 1, b = 2;

int c = 3;

int d = 100;

if(3 > 0){

d = 100 - z;

} else{

d = 1 + z;

}

return d;

}

int foo(int z){

int d = 100;

if(true){

d = 100 - z;

}

return d;

}

int foo(int z){

return 100 - z;

}

5

6 of 43

Example: Function Cloning via Interprocedural Constant Propagation

Creates optimized variants of functions based on interprocedural constant propagation. Sacrifices code size, but some function calls will execute faster.

int bar(int z) {

int b = foo(z, z);

int c = foo(1, z);

return b + c;

}

�

int foo(int x, int y)

{

if(x > 0){

y = y * 2;

}

else{

y = y + 8;

}

return y;

}

int bar(int z) {

int b = foo(z, z);

int c = foo_1(z);

return b + c;

}

int foo(int x, int y)

{

...

}

int foo_1(int y)

{

if(1 > 0){

y = y * 2;

}

else{

y = y + 8;

}

return y;

}

int bar(int z) {

int b = foo(z, z);

int c = foo_1(z);

return b + c;

}

int foo(int x, int y)

{

if(x > 0){

y = y * 2;

}

else{

y = y + 8;

}

return y;

}

int foo_1(int y)

{

return y * 2;

}

6

7 of 43

Performance Optimizations in Modern Compilers

Modern compilers implement dozens of individual optimizations, many of which are interdependent.

To simplify the process of selecting which optimizations to apply, optimization levels are used. For example, GCC [2] defines several optimization levels:

-O0 : Optimization Level 0. Disables almost all optimizations. Default.
-O1 : Optimization Level 1. Enables optimizations that take relatively little incremental compilation time.
-O2 : Optimization Level 2. Enables almost all optimizations that don’t involve space-speed tradeoffs. Will increase compilation time.
-O3 : Optimization Level 3. Enables all level 2 optimizations, as well as space-speed tradeoff optimizations. Will increase compilation time even more.

7

8 of 43

Agenda

Background: Compiler Performance Optimizations
Problem: Security Weaknesses Introduced by Optimizations
Study: Impact of Optimizations on Code Reuse Attack Susceptibility

Do compiler optimizations improve the quantity / quality of code reuse gadgets? If so, why?

Mitigation: Potential Solutions and Mitigations

8

9 of 43

Security Weaknesses Introduced By Performance Optimizations

Problem: Despite years of research formally proving compilers correct with respect to the source code’s functional semantics, numerous instances have arisen where compilers violate the security guarantees implemented in source code.

Recent work in 2015 by D’Silva et al. [1] studied this situation and coined the term “Correctness-Security Gap” to describe it.

The underlying problem is that compilers were never designed to satisfy security criteria. It is the subject of open debate whether or violation of these criteria by a compiler constitutes buggy behavior.

Instances of the correctness-security gap fall into three categories: Elimination of Security-Oriented Code, Information Leaks, and Side Channel Introduction.

9

10 of 43

Example: Elimination of Security-Oriented Code

A common example occurs as a result of the Dead Store Elimination optimization.

This optimization eliminates memory write operations to memory locations that will not be read later.

Data sanitization code that blanks out sensitive data in memory triggers this optimization, resulting in the removal of the security oriented code.

This is also an example of information leakage, as sensitive data might later be recovered by an attacker.

Vulnerable Security-Oriented Function [1]

10

11 of 43

Example: Side Channel Introduction

A common example occurs as a result of the Common Subexpression Elimination optimization.

This optimization eliminates redundant computations of subexpressions, reducing execution time.

This optimization can break side-channel defenses that are designed to ensure constant computation time at conditional branches.

Vulnerable Security-Oriented Function Before and After CSE Optimization [1]

11

12 of 43

Agenda

Background: Compiler Performance Optimizations
Problem: Security Weaknesses Introduced by Optimizations
Study: Impact of Optimizations on Code Reuse Attack Susceptibility

Do compiler optimizations improve the quantity / quality of code reuse gadgets? If so, why?

Mitigation: Potential Solutions and Mitigations

12

13 of 43

Gadget-Based CRAs: Return Oriented Programming (ROP)

In 2007, Shacham proposed the first gadget-based code reuse attack method, called Return-Oriented Programming (ROP) [5].
Gadgets are used like complex instructions, and the the total set of gadgets available to the attacker is their ISA.
Gadgets end in a return instruction, which the attacker can exploit to maintain control flow.
Attacks of this type are difficult to detect and defend against. Shacham’s paper has been cited over 1300, mostly by authors devising increasingly complex defenses and novel attacks.

[6]

13

14 of 43

Gadget-Based CRAs: Jump Oriented Programming (JOP)

Most of these defenses were aimed at protecting the program stack.
In response, JOP [6, 7] was introduced. JOP uses gadgets ending in indirect jump and call instructions to maintain control flow instead of relying on return instructions and the stack.
JOP requires the use of a special purpose gadget, called a dispatcher that handles control flow.
COP – call only derivative of JOP [8].

[6]

14

15 of 43

Gadget Types

Functional gadgets perform computational tasks. They are used like individual instructions in an ISA.

Special purpose gadgets are used as scaffolding to create gadget chains. They are mostly used to create JOP and COP attacks, however syscall gadgets are useful in all attack models.

Attackers find these gadgets by searching for gadget producing instructions – return, indirect call, and indirect jumps.

x86 and x86-64 have variable length instructions, so it is possible to decode instructions from an offset other than the original instruction boundary. Gadgets found using this method are called unintended gadgets.

15

16 of 43

Motivation

Our recent work in the effects of software debloating (another type of software transformation) have shown that even small changes in software can result in large changes to code reuse gadget populations, including the introduction of new gadgets [3].

This led us to question what effect compiler optimizations have on code reuse gadget populations: are they introducing weaknesses against Code Reuse Attacks?

We set out to answer these motivating questions:

To what degree do compiler optimizations introduce new gadgets in optimized code?

To what degree do compiler optimizations negatively impact (increase) the number and quality of code reuse gadgets in optimized code?

Which specific optimizations are problematic, and can we improve them to mitigate negative impacts?

16

17 of 43

Study Methodology

We conducted our study in a top down manner to answer our motivating questions.

Using two different production compilers, GCC v7.4.0 and Clang v8.0.1, we built unoptimized (-O0) and optimized (-O1, -O2, -O3) variants of 21 different programs and analyzed them to learn how gadget populations differ at various optimization levels.

We then drilled deeper and built over 900 different “single-optimization” variants and analyzed them to learn how gadget populations are influenced by individual optimizations. We identified particularly interesting optimizations by identifying outliers in the resulting data set.

17

18 of 43

Study Methodology - Benchmarks

Seven of the benchmarks are commonly used Linux packages:

Bftpd v5.1
cUrl v7.65.0
git v.2.21.0
gzip v1.10
httpd v2.4.39
lmdb v0.9.23
sqlite v.3.28.0

Fourteen of the packages are from SPEC 2006:

401.bzip2
403.gcc
429.mcf
433.milc
435.gromacs
444.namd
445.gobmk
453.povray
456.hmmer
458.sjeng
462.libquantum
470.lbm
471.omnetpp
482.sphinx

We selected 21 different software packages that varied in size, complexity, and functionality.

18

19 of 43

Study Methodology - Metrics

We use previously established metrics for analyzing code reuse gadget sets in program binaries [3], which are calculated using a static analysis tool, GSA (GadgetSetAnalyzer) [3, 4].

Gadget Introduction Rate

Calculated as the percentage of gadgets in optimized code that are not present in the unoptimized code.

Functional Gadget Set Expressivity

Measures the computational power of a set of gadgets against different levels (Practical Exploits, ASLR-Resistant Exploits, Turing Completeness).
Calculated as the proportion of computational classes satisfied per level.

Special Purpose Gadget Availability

Useful to identify which classes of exploits (i.e., JOP, COP) can be constructed.
Calculated as the number of special purpose gadget categories available in the gadget set, out of a total of 10.

19

20 of 43

Results – Gadget Introduction

Our prior work indicates that software transformations can have drastic effects on code reuse gadget populations.

We expect that some compiler optimizations will remove gadgets, but some will likely introduce new ones. Specifically, space-speed tradeoff optimizations at –O3 that duplicate code.

20

21 of 43

Results – Gadget Introduction

21

22 of 43

Results – Gadget Introduction

Key Observations on Gadget Counts:

Clang typically produces unoptimized binaries with far fewer gadgets. (20 of 21 benchmarks)
However, Clang optimizations almost always significantly increase the number of gadgets. (20 of 21 benchmarks)
In contrast, GCC optimizations almost always significantly reduce the number of gadgets. (2/3 of variants)
Clang is still generally better by count.

Key Observations on Gadget Introduction Rate:

Regardless of compiler used, the vast majority of gadgets present in optimized code are not present in unoptimized code.
Smallest observed gadget introduction rate was 68.5%.
Almost all variants have introduction rates are > 85%.
Not necessarily a bad thing; the quality of the gadget sets is what really matters [3].

22

23 of 43

Results – Coarse-Grained Qualitative Analysis

Given the high rates of gadget introduction observed, we expect that we will see interesting qualitative gadget set results based on lessons learned in prior work [3].

We start with a coarse-grained qualitative analysis in which we observe changes in our gadget set metrics at different optimization levels.

23

24 of 43

Results – Coarse-Grained Qualitative Analysis (GCC)

24

25 of 43

Results – Coarse-Grained Qualitative Analysis (Clang)

25

26 of 43

Results – Coarse-Grained Qualitative Analysis

Key Observations for Gadget Set Expressivity:

In the vast majority of cases, optimized code is more expressive than unoptimized code.
Increases in expressivity are unexpectedly non-linear. In some cases, increasing the optimization level results in a smaller increase in expressivity.
This means that higher order optimizations can clean up after lower order optimizations.
It also means that we can’t make informed tradeoffs at the optimization level.

Key Observations for Special Purpose Gadget Availability:

In the majority of cases, optimizing the code does not change which types of special purpose gadgets are available.
It is more likely for optimizations to eliminate categories of special purpose gadgets than it is to introduce new ones.
Clang was observed to introduce new special purpose gadget types more frequently than GCC (33.3% vs 12.7%).
Once again, the data is unexpectedly non-linear.

26

27 of 43

Results – Fine-Grained Qualitative Analysis

The results of our coarse-grained analysis exposed non-linearity. This is unexpected because increasing the optimization level typically means we perform all lower level optimizations in addition to a new set of optimizations.

Since some optimizations can create new optimization targets for others, interactivity between individual optimizations is a partial cause of this non-linearity.

The unpredictability of unintended gadgets remains another major factor.

We need to dig deeper and look at individual optimizations. In our fine-grained analysis, we generated over nine hundred different variants and compared them to the appropriate baseline:

Individual –O1 optimizations vs. –O0 variant
Individual –O2 optimizations vs. –O1 variant
Individual –O3 optimizations vs. –O2 variant

27

28 of 43

Results – Fine-Grained Qualitative Analysis

28

29 of 43

Results – Fine-Grained Qualitative Analysis

How about we just talk about trends we observed.....

29

30 of 43

Results – Fine-Grained Qualitative Analysis

Key Observations:

The vast majority of single-optimization variants had qualitative results similar within similar ranges we observed for coarse-grained analysis.
In these variants we generally observed relatively small increases in gadget set expressivity (< 3 computational classes), and the introduction of one additional category of special purpose gadgets.
We also observed positive impacts of the same magnitude in a significant number of variants.
These effects were observed across both compilers, all benchmarks, and optimizations.
This suggests that there is a level of “noise” in gadget set quality that occurs as a result of optimization, which is consistent with our results in prior work.

Is there signal in this noise?

We did observe a number of outliers, where negative impacts to gadget set quality were observed that were of significantly larger magnitude than the noise we observed.
Dirty Half-dozen transformations. GCC: Frame Pointer Omission, IPA CP Function Cloning, Shrink Wrap, Peephole Optimizations, and CSE-based Jump Following. Clang: Tail-call Elimination.

30

31 of 43

Root Causes - Noise

We performed differential binary analysis on the variant binaries with IDA [9] and BinDiff [10] to identify the root causes of the effects we observed.

In general, the noise we observed was the result of “churn” that occurs during software transformation.

We observed that increases in both qualitative metrics could be explained largely by the introduction of new unintended gadgets of short length, typically in the data segments like branch offsets.

This is most prevalent for special purpose gadget introduction. Syscall and Intra-Stack Pivot gadgets were disproportionately affected because they are shorter than four bytes in length:

CD 80 (int 0x80)
0F 34 (sysenter)
0F 05 (syscall)

31

32 of 43

Root Causes – Duplication of Gadget Producing Instructions

A significant cause of gadget introduction was caused by the duplication of gadget producing instructions. This occurs logically as a result of IPA CP Function Cloning.

int foo(int x, int y)

{

if(x > 0){

y = y * 2;

}

else{

y = y + 8;

}

return y;

}

int foo(int x, int y)

{

if(x > 0){

y = y * 2;

}

else{

y = y + 8;

}

return y;

}

int foo_1(int y)

{

return y * 2;

}

32

33 of 43

Root Causes – Duplication of Gadget Producing Instructions

A significant cause of gadget introduction was caused by the duplication of gadget producing instructions. This happens in both simple and complex transformations.

33

34 of 43

Root Causes – Special Purpose Gadgets as Optimizations

In one optimization, Tail Call Elimination implemented in Clang, we observed the intentional creation of instruction sequences that can be used as JOP Dataloader gadgets.

34

35 of 43

Agenda

Background: Compiler Performance Optimizations
Problem: Security Weaknesses Introduced by Optimizations

Removal of Security Controls
Information Leakage
Side Channel Introduction

Study: Impact of Optimizations on Code Reuse Attack Susceptibility

Do compiler optimizations improve the quantity / quality of code reuse gadgets? If so, why?

Mitigation: Potential Solutions

35

36 of 43

Solutions - Noise

A number of static software transformation techniques have been proposed by Onarlioglu et al [11] that can eliminate unintended gadgets.

For example, constants in a program that correspond to the encoding of a return instruction can be transformed to eliminate unintended gadgets terminating at the location fairly easily:

These techniques are typically not implemented in modern compilers because they are machine- specific, and cannot be implemented in IR. Most compiler back-ends are not architected for optimization passes.

36

37 of 43

Solutions – Duplicated Gadget Producing Instructions

Complier optimizations in GCC can be patched to avoid duplicating returns, indirect jumps, and indirect calls. Provides most of the benefit of the optimization, but doesn’t introduce new gadgets.

Alternatively, a machine-specific back end pass could intra-procedurally merge gadget-producing instructions.

37

38 of 43

Solutions – Duplicated Gadget Producing Instructions

Complier optimizations in GCC can be patched to avoid duplicating returns, indirect jumps, and indirect calls. Provides most of the benefit of the optimization, but doesn’t introduce new gadgets.

Alternatively, a machine-specific back end pass could intra-procedurally merge gadget-producing instructions.

38

39 of 43

Solutions – Special Purpose Gadgets as Optimizations

JOP Dataloader gadgets introduced by Tail Call Elimination cannot be fixed through static software transformation without violating the program’s functional semantics.

Dynamic gadget protection methods that incur runtime overhead could be used.

Alternatively, we can choose to disable optimizations such as Tail Call Elimination if the performance benefit is not worth the introduction of this type of gadget.

This could be the case when a program doesn’t otherwise contain these gadgets and tail call elimination would only optimize code that runs infrequently.

39

40 of 43

Some Closing Notes on Performance Implications

The performance benefits of compiler optimizations are quite significant.

Solutions to mitigate the introduction of high quality gadgets all carry a cost, which must be considered before use.

Using alignment sleds and transformation techniques to get rid of unintended gadgets increases code size on the order of 30% and can result in small increases to execution time.
Using transformation techniques to merge gadget-producing instructions will result in small increases to code size and execution time.

This won’t work for all instances observed, specifically cloning optimizations. Disabling these optimizations will have an effect similar to using –O2 versus –O3.

Disabling specific optimizations such as Dead Store Elimination or Tail Call Elimination may have significant performance impacts, depending on the source program.

40

41 of 43

Conclusion - Summary of Key Takeaways

Despite being formally proven correct with respect to functional semantics, a gap exists with respect to security and compiler function.
When implementing security sensitive code or security controls, it is important for the programmer to consider how compiler optimizations might unintentionally introduce weaknesses.
Compiler optimizations can also increase the number and quality of code reuse gadgets made available to a potential attacker. Certain optimizations are worse than others.
Developing solutions to these problems remains an open area of research, but potential solutions exist. Some optimizations may involve simply making an informed tradeoff between performance and security.

41

42 of 43

Questions?

42

43 of 43

References

Vijay D’Silva, Mathias Payer, and Dawn Song. 2015. The Correctness-Security Gap in Compiler Optimization. In 2015 IEEE Security and Privacy Workshops (SPW). IEEE.
Using the GNU Compiler Collection (GCC). https://gcc.gnu.org/onlinedocs/gcc/.
Michael D. Brown and Santosh Pande. 2019. Is Less Really More? Towards Better Metrics for Measuring Security Improvements Realized Through Software Debloating. In 12th {USENIX} Workshop on Cyber Security Experimentation and Test ({CSET} 19). USENIX Association.
GadgetSetAnalyzer. https://github.com/michaelbrownuc/GadgetSetAnalyzer
Hovav Shacham. 2007. The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86). In Proceedings of the 14th ACM conference on Computer and communications security (CCS '07). ACM, New York, NY, USA, 552-561. DOI: https://doi.org/10.1145/1315245.1315313.
Tyler Bletsch, Xuxian Jiang, Vince W. Freeh, and Zhenkai Liang. 2011. Jump-oriented programming: a new class of code-reuse attack. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security (ASIACCS '11). ACM, New York, NY, USA, 30-40. DOI: https://doi.org/10.1145/1966913.1966919.
Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, Hovav Shacham, and Marcel Winandy. 2010. Return-oriented programming without returns. In Proceedings of the 17th ACM conference on Computer and communications security (CCS '10). ACM, New York, NY, USA, 559-572. DOI: https://doi.org/10.1145/1866307.1866370
AliAkbar Sadeghi, Salman Niksefat and Maryam Rostamipour. 2018. Pure-Call Oriented Programming (PCOP): chaining the gadgets using call instructions. In Computer Virology Hacking Techniques 14: 139. Springer Paris. https://doi.org/10.1007/s11416-017-0299-1.
IDA: About. https://www.hex-rays.com/products/ida/index.shtml
zynamics BinDiff. https://www.zynamics.com/bindiff.html
Kaan Onarlioglu, Leyla Bilge, Andrea Lanzi, Davide Balzarotti, and Engin Kirda. 2010. G-Free: defeating return-oriented programming through gadget-less binaries. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC '10). ACM, New York, NY, USA, 49-58. DOI=10.1145/1920261.1920269 http://doi.acm.org/10.1145/1920261.1920269

43