1 of 32

To be Precise:

Regression Aware Debugging

Awanish Pandey

Assistant Professor

Indian Institute of Technology Roorkee

Published in OOPSLA ‘16

2 of 32

What is Regression

A previously working feature stops working
A bug has been introduced
Types of regression

Functional
Performance

Detection

Regression testing
Static and Dynamic Analysis

3 of 32

Introduction to Bug Localization...

Input/Output or Input/Assert Pairs

Contains a failing test

To Be Precise : Regression Aware Debugging

TEST-SUITE

LOCALIZER

{ Set of SUSPICIOUS Program Locations }

4 of 32

What is a good Bug Localization?

To Be Precise : Regression Aware Debugging

The set of suspicious locations is small w.r.t actual program size
The developer finds a fix at one of these locations

5 of 32

Background : Partial-MaxSAT for Bug Localization

To Be Precise : Regression Aware Debugging

Partial MaxSAT?

HARD Clauses

CANNOT be relaxed(ignored)

SOFT Clauses

CAN be relaxed(ignored)

Minimum number of clauses to relax for satisfiability?

CNF Formula

(x₁||x₂||x₃) && (!x₁||x₂||x₃) && (x₁||!x₂||x₃) && (x₁||x₂||!x₃)

6 of 32

BugAssist [Jose et al. ‘11]

Input Constraint : (c == 0)

Output Constraint : (a == 2)

(Buggy Testcase)

To Be Precise : Regression Aware Debugging

HARD

SOFT

(c==0) && (a == 0) &&

(((c == 1) && (a_1 == a + 1)) ||

((c != 1) && (a_1 == a - 1))) &&

(a_1 == 2)

a = 2;

HARD

a = 3;

7 of 32

BugAssist [Jose et al. ‘11]

To Be Precise : Regression Aware Debugging

Outputs minimal number of relaxed statements to fix the failing test
Outputs a relatively small but still large set of suspicious locations (~15-20 for a 100 LOC program)
Can we do better?

8 of 32

Regression Awareness

To Be Precise : Regression Aware Debugging

No suspicious locations are created equal

BugAssist looks at just the failing test. What about the passing tests?
Can we reasonably predict whether any change at a particular location is likely to cause regression? (even if it fixes the failing test)
Augment the suspicious locations with a regression score

Indicates how hostile it is to passing tests

9 of 32

Our Contribution

To Be Precise : Regression Aware Debugging

We extract proofs of correctness of passing tests
We add additional constraints (roadblocks) that discourage modifications that may damage the proof of a passing test.
These roadblocks are summaries of program execution (passing tests) at various points (basic block boundaries), which we compute using Craig Interpolation
We perform Partial MaxSAT on the new formula
And Compute Regression Score from result

Tintin

10 of 32

Background : Craig Interpolation

To Be Precise : Regression Aware Debugging

Let (A ∧ B) be unsatisfiable. Then 工 is a Craig Interpolant of (A ∧ B) if :

A => 工
工 ∧ B is still unsatisfiable
工 only contains free variables common to A and B

工 summarizes the role of A in the unsatisfiability
Formula (a==-3) && (a+1>1)
Interploant (a<1)
Use the negation of the desired output-condition in passing tests to get unsatisfiable formula.

11 of 32

Craig Interpolation (on an execution trace)

To Be Precise : Regression Aware Debugging

工

}

Negated Output Constraint : (a != 1)

(Passing Testcase)

Input Constraint : (c == 1)

12 of 32

Regression Score

Min number of relaxed clauses =

No. of relaxed statements (ns) + No. of relaxed roadblocks (nr)

Regression Score = f (ns, nr)

Lower the score, less likely it is for a change to induce regression (more suspicious is the location)

To Be Precise : Regression Aware Debugging

13 of 32

Motivating Example

To Be Precise : Regression Aware Debugging

Traffic Collision Avoidance System (Part of Siemen’s Benchmarks)

Artificially injected faults to create ~40 buggy versions

14 of 32

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Failing Test :

Own_Tracked_Alt == Other_Tracked_Alt

15 of 32

How does BugAssist perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

16 of 32

How does Tintin perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

17 of 32

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Passing Tests (Truncated)-

Own_Tracked_Alt = 500

Other_Tracked_Alt = 424

Output : 1

2) Own_Tracked_Alt = 560

Other_Tracked_Alt = 601

Output : 0

...

(Own_Tracked_Alt > Other_Tracked_Alt)

⇓

return 0;

(Own_Tracked_Alt < Other_Tracked_Alt)

⇓

return 1;

18 of 32

Proof Breakage != Test Failure (Regression)

To Be Precise : Regression Aware Debugging

19 of 32

Implementation

To Be Precise : Regression Aware Debugging

Built on top of the CBMC model checker (Clarke et al. ‘04)

Constraint Based Model Checker
Verification Condition
preCondition && ProgramFormula && !PostCondition

MathSAT for interpolation (Griggio et al. ‘13)
MsUncore for Partial MaxSAT (Marques-Silva et al. ‘13)

20 of 32

Evaluation

To Be Precise : Regression Aware Debugging

RQ 1 : Does regression awareness improve the results for bug localization?

RQ 1a : With respect to the ground-truth location/repair
RQ 1b : With respect to all repairable locations

RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?

21 of 32

RQ 1a (w.r.t Ground-Truth location)

To Be Precise : Regression Aware Debugging

Assumptions and Methodology :

Assume the developer goes in decreasing order of suspiciousness
For (Re)BugAssist, score is 1 for every location (random ordering)
For Tintin, random ordering between locations with equal score
Representative set of passing tests used (To prevent blow-up)

Quality measured by number of locations required to be examined before reaching ground-truth location/repair
Based on developer effort saved (w.r.t BugAssist)

22 of 32

RQ 1a

RQ 1 : Does regression awareness improve the results for bug localization? (reduction in developer effort)

Best-Case :

TCAS - 39%
Larger Programs (Siemens) - 60%
SV-COMP and Cascade - 65%

Average-Case :

TCAS - 15%
Larger Programs (Siemens) - 50%
SV-COMP and Cascade - 61%

23 of 32

RQ1b (w.r.t all Repairable locations)

To Be Precise : Regression Aware Debugging

For a given set of passing tests and the failing test under inspection -

The ground-truth might not be the only viable repair location
Many possible locations where repairs can be introduced without causing regression

Does our algorithm rank most of these viable locations high (most suspicious)?

24 of 32

RQ1b

To Be Precise : Regression Aware Debugging

Use a repair tool Angelix (Mechtaev et al. ICSE16) to generate repairs (approximation)
Two configurations -

All possible grammars (Strongest setting) - Timeout of 20 minutes
Small grammars : No Timeout

ALL passing tests used (~1000)
Plot the rankings of these repairs

25 of 32

Evaluation Approach 2

To Be Precise : Regression Aware Debugging

75% of crosses lie below the 60% mark

26 of 32

RQ2 : Overhead

To Be Precise : Regression Aware Debugging

RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?

For TCAS :

Formula blow-up : 1.33x
Slow-down : 4.5x

27 of 32

What about Program Repair

To Be Precise : Regression Aware Debugging

I’m lazy, I want repairs, not localizations
ReDirectFix based on the DirectFix algorithm (Mechtaev et al. ‘15); however, restricted to classes of repairs (off-by-one etc.)
SMT solver searches for possible mutations that fix all tests --- regression freedom
However, formula can blowup!
Can we use regression awareness?

28 of 32

Tintin - A compromise

We don’t guarantee regression freedom, but we deal with smaller formulas
Use interpolants as preconditions and postconditions for the statement under consideration for repair (via grammar)

INTERPOLANT PRECOND - (a > b)�if (choice == 1) {� selection = b + a;�} else if (choice == 2) {� selection = b - a;�}...�INTERPOLANT POSTCOND - (selection > 0)�

29 of 32

Repair in Tintin

To Be Precise : Regression Aware Debugging

RQ 1 : Does using regression awareness instead of regression freedom degrade results?

The ground-truth repair was ranked the highest in more than 70% of the programs in our experiments

RQ 2 : What is the impact on formula size and solving time?

Reduction of around 43% on formula sizes
We attained speedups between 1.3x to 6.5x

30 of 32

Future Work

Modification for performance-related regression bug

Massive parallelization of formula checking

Scaling for larger program

31 of 32

Summary

To Be Precise : Regression Aware Debugging

A novel technique of using Craig Interpolants as summaries of passing tests to improve Bug Localization

A potential application of using interpolants to speed up program repair without compromising much on quality

1 of 32

2 of 32

3 of 32

4 of 32

5 of 32

6 of 32

7 of 32

8 of 32

9 of 32

10 of 32

11 of 32

12 of 32

13 of 32

14 of 32

15 of 32

16 of 32

17 of 32

18 of 32

19 of 32

20 of 32

21 of 32

22 of 32

23 of 32

24 of 32

25 of 32

26 of 32

27 of 32

28 of 32

29 of 32

30 of 32

31 of 32

32 of 32