1 of 36

To Be Precise :

Regression Aware Debugging

Rohan Bavishi, Awanish Pandey, Subhajit Roy

Indian Institute of Technology Kanpur

(IIT Kanpur)

OOPSLA ‘16, Amsterdam, Netherlands (Nov. 4, 2016)

2 of 36

Introduction to Bug Localization...

Input/Output or Input/Assert Pairs

Contains a failing test

IIT Kanpur

To Be Precise : Regression Aware Debugging

TEST-SUITE

LOCALIZER

{ Set of SUSPICIOUS Program Locations }

3 of 36

What is a good Bug Localization?

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • The set of suspicious locations is small w.r.t actual program size
  • The developer finds a fix at one of these locations

4 of 36

Background : Partial-MaxSAT for Bug Localization

IIT Kanpur

To Be Precise : Regression Aware Debugging

Partial MaxSAT?

CNF Formula

HARD Clauses

CANNOT be relaxed(ignored)

SOFT Clauses

CAN be relaxed(ignored)

Minimum number of clauses to relax for satisfiability?

5 of 36

BugAssist [Jose et al. ‘11]

Input Constraint : (c == 0)

Output Constraint : (a == 2)

(Buggy Testcase)

IIT Kanpur

To Be Precise : Regression Aware Debugging

HARD

HARD

SOFT

SOFT

SOFT

SOFT

Min Relaxed Clauses for SAT?

Ans : 1

a = 3;

a = 2;

HARD

6 of 36

BugAssist [Jose et al. ‘11]

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Outputs minimal number of relaxed statements to fix the failing test
  • Outputs a relatively small but still large set of suspicious locations (~15-20 for a 100 LOC program)
  • Can we do better?

7 of 36

Regression Awareness

IIT Kanpur

To Be Precise : Regression Aware Debugging

No suspicious locations are created equal

  • BugAssist looks at just the failing test. What about the passing tests?
  • Can we reasonably predict whether any change at a particular location is likely to cause regression? (even if it fixes the failing test)
  • Augment the suspicious locations with a regression score
    • Indicates how hostile it is to passing tests

8 of 36

Our Contribution

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • We extract proofs of correctness of passing tests
  • We add additional constraints (roadblocks) that discourage modifications that may damage the proof of a passing test.
  • These roadblocks are summaries of program execution (passing tests) at various points (basic block boundaries), which we compute using Craig Interpolation
  • We perform Partial MaxSAT on the new formula
  • And Compute Regression Score from result

Tintin

9 of 36

Background : Craig Interpolation

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Let (A ∧ B) be unsatisfiable. Then is a Craig Interpolant of (A ∧ B) if :
    • A =>
    • ∧ B is still unsatisfiable
    • only contains free variables common to A and B
  • summarizes the role of A in the unsatisfiability
  • Use the negation of the desired output-condition in passing tests to get unsatisfiable formula.

10 of 36

Craig Interpolation (on an execution trace)

IIT Kanpur

To Be Precise : Regression Aware Debugging

}

A

}

B

Negated Output Constraint : (a != 1)

(Passing Testcase)

Input Constraint : (c == 1)

11 of 36

Regression Score

  • Min number of relaxed clauses =

No. of relaxed statements (ns) + No. of relaxed roadblocks (nr)

  • Regression Score = f (ns, nr)
    • Lower the score, less likely it is for a change to induce regression (more suspicious is the location)

IIT Kanpur

To Be Precise : Regression Aware Debugging

Slide* oopslaWorstSlide = this + 2;

12 of 36

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Traffic Collision Avoidance System (Part of Siemen’s Benchmarks)

Artificially injected faults to create ~40 buggy versions

13 of 36

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Failing Test :

Own_Tracked_Alt == Other_Tracked_Alt

14 of 36

How does BugAssist perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

15 of 36

How does Tintin perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

16 of 36

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Passing Tests (Truncated)-

  1. Own_Tracked_Alt = 500

Other_Tracked_Alt = 424

Output : 1

2) Own_Tracked_Alt = 560

Other_Tracked_Alt = 601

Output : 0

...

(Own_Tracked_Alt > Other_Tracked_Alt)

return 0;

(Own_Tracked_Alt < Other_Tracked_Alt)

return 1;

17 of 36

Proof Breakage != Test Failure (Regression)

IIT Kanpur

To Be Precise : Regression Aware Debugging

18 of 36

Implementation

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Built on top of the CBMC model checker (Clarke et al. ‘04)
  • MathSAT for interpolation (Griggio et al. ‘13)
  • MsUncore for Partial MaxSAT (Marques-Silva et al. ‘13)

19 of 36

Evaluation

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • RQ 1 : Does regression awareness improve the results for bug localization?
    • RQ 1a : With respect to the ground-truth location/repair
    • RQ 1b : With respect to all repairable locations

  • RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?

20 of 36

RQ 1a (w.r.t Ground-Truth location)

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Assumptions and Methodology :
    • Assume the developer goes in decreasing order of suspiciousness
    • For (Re)BugAssist, score is 1 for every location (random ordering)
    • For Tintin, random ordering between locations with equal score
    • Representative set of passing tests used (To prevent blow-up)
  • Quality measured by number of locations required to be examined before reaching ground-truth location/repair
  • Based on developer effort saved (w.r.t BugAssist)

21 of 36

RQ 1a

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • RQ 1 : Does regression awareness improve the results for bug localization? (reduction in developer effort)
    • Worst-Case :
      • TCAS - 39%
      • Larger Programs (Siemens) - 50%
      • SV-COMP and Cascade - 65%
    • Average-Case :
      • TCAS - 15%
      • Larger Programs (Siemens) - 50%
      • SV-COMP and Cascade - 61%

22 of 36

RQ1b (w.r.t all Repairable locations)

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • For a given set of passing tests and the failing test under inspection -
    • The ground-truth might not be the only viable repair location
    • Many possible locations where repairs can be introduced without causing regression
  • Does our algorithm rank most of these viable locations high (most suspicious)?

23 of 36

RQ1b

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Use a recent state-of-the-art repair tool Angelix (Mechtaev et al. ICSE16) to generate repairs (approximation)
  • Two configurations -
    • All possible grammars (Strongest setting) - Timeout of 20 minutes
    • Small grammars : No Timeout
  • ALL passing tests used (~1000)
  • Plot the rankings of these repairs

24 of 36

Evaluation Approach 2

IIT Kanpur

To Be Precise : Regression Aware Debugging

75% of crosses lie below the 60% mark

25 of 36

RQ2 : Overhead

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?
    • For TCAS :
      • Formula blow-up : 1.33x
      • Slow-down : 4.5x

26 of 36

What about Program Repair

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • I’m lazy, I want repairs, not localizations
  • ReDirectFix based on the DirectFix algorithm (Mechtaev et al. ‘15); however, restricted to classes of repairs (off-by-one etc.)
  • SMT solver searches for possible mutations that fix all tests --- regression freedom
  • However, formula can blowup!
  • Can we use regression awareness?

27 of 36

Tintin - A compromise

IIT Kanpur

  • We don’t guarantee regression freedom, but we deal with smaller formulas
  • Use interpolants as preconditions and postconditions for the statement under consideration for repair (via grammar)

INTERPOLANT PRECOND - (a > b)�if (choice == 1) {� selection = a + b;�} else if (choice == 2) {� selection = a - b;�}...�INTERPOLANT POSTCOND - (selection > 0)

28 of 36

Repair in Tintin

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • RQ 1 : Does using regression awareness instead of regression freedom degrade results?
    • The ground-truth repair was ranked the highest in more than 70% of the programs in our experiments
  • RQ 2 : What is the impact on formula size and solving time?
    • Reduction of around 43% on formula sizes
    • We attained speedups between 1.3x to 6.5x

29 of 36

Summary

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • A novel technique of using Craig Interpolants as summaries of passing tests to improve Bug Localization

  • A potential application of using interpolants to speed up program repair without compromising much on quality

30 of 36

Thank you!

Questions?

31 of 36

Evaluation Approach 1

IIT Kanpur

To Be Precise : Regression Aware Debugging

TCAS (All Versions) - Worst Case (Last in random ordering)

32 of 36

Evaluation Approach 1

IIT Kanpur

To Be Precise : Regression Aware Debugging

TCAS (All Versions) - Average Case (Average pos in random ordering)

33 of 36

Regression Freedom in Repair

IIT Kanpur

To Be Precise : Regression Aware Debugging

  • Since we are dealing with actual repairs, we can ensure it works for every test by working on a combined program formula
  • This formula is obtained by concatenating the program formulas for every test considered (passing or failing)
  • Guarantees regression freedom

  • Formula size blow-up!

34 of 36

ReDirectFix

IIT Kanpur

  • Consider a simple grammar :
    • Off-By-One errors
    • Incorrect Relational Operators
    • Incorrect Logical Operator
  • The solver tries to find an applicable grammar rule to fix the program

35 of 36

Background : Partial-MaxSAT for Bug Localization

IIT Kanpur

To Be Precise : Regression Aware Debugging

Programs as

Logic?

36 of 36

IIT Kanpur

To Be Precise : Regression Aware Debugging