1 of 32

To be Precise:

Regression Aware Debugging

Awanish Pandey

Assistant Professor

Indian Institute of Technology Roorkee

Published in OOPSLA ‘16

2 of 32

What is Regression

  • A previously working feature stops working
  • A bug has been introduced
  • Types of regression
    • Functional
    • Performance
  • Detection
    • Regression testing
    • Static and Dynamic Analysis

3 of 32

Introduction to Bug Localization...

Input/Output or Input/Assert Pairs

Contains a failing test

To Be Precise : Regression Aware Debugging

TEST-SUITE

LOCALIZER

{ Set of SUSPICIOUS Program Locations }

4 of 32

What is a good Bug Localization?

To Be Precise : Regression Aware Debugging

  • The set of suspicious locations is small w.r.t actual program size
  • The developer finds a fix at one of these locations

5 of 32

Background : Partial-MaxSAT for Bug Localization

To Be Precise : Regression Aware Debugging

Partial MaxSAT?

HARD Clauses

CANNOT be relaxed(ignored)

SOFT Clauses

CAN be relaxed(ignored)

Minimum number of clauses to relax for satisfiability?

CNF Formula

(x1||x2||x3) && (!x1||x2||x3) && (x1||!x2||x3) && (x1||x2||!x3)

6 of 32

BugAssist [Jose et al. ‘11]

Input Constraint : (c == 0)

Output Constraint : (a == 2)

(Buggy Testcase)

To Be Precise : Regression Aware Debugging

HARD

HARD

SOFT

SOFT

SOFT

SOFT

(c==0) && (a == 0) &&

(((c == 1) && (a_1 == a + 1)) ||

((c != 1) && (a_1 == a - 1))) &&

(a_1 == 2)

a = 2;

HARD

a = 3;

7 of 32

BugAssist [Jose et al. ‘11]

To Be Precise : Regression Aware Debugging

  • Outputs minimal number of relaxed statements to fix the failing test
  • Outputs a relatively small but still large set of suspicious locations (~15-20 for a 100 LOC program)
  • Can we do better?

8 of 32

Regression Awareness

To Be Precise : Regression Aware Debugging

No suspicious locations are created equal

  • BugAssist looks at just the failing test. What about the passing tests?
  • Can we reasonably predict whether any change at a particular location is likely to cause regression? (even if it fixes the failing test)
  • Augment the suspicious locations with a regression score
    • Indicates how hostile it is to passing tests

9 of 32

Our Contribution

To Be Precise : Regression Aware Debugging

  • We extract proofs of correctness of passing tests
  • We add additional constraints (roadblocks) that discourage modifications that may damage the proof of a passing test.
  • These roadblocks are summaries of program execution (passing tests) at various points (basic block boundaries), which we compute using Craig Interpolation
  • We perform Partial MaxSAT on the new formula
  • And Compute Regression Score from result

Tintin

10 of 32

Background : Craig Interpolation

To Be Precise : Regression Aware Debugging

  • Let (A ∧ B) be unsatisfiable. Then is a Craig Interpolant of (A ∧ B) if :
    • A =>
    • ∧ B is still unsatisfiable
    • only contains free variables common to A and B
  • summarizes the role of A in the unsatisfiability
  • Formula (a==-3) && (a+1>1)
  • Interploant (a<1)
  • Use the negation of the desired output-condition in passing tests to get unsatisfiable formula.

11 of 32

Craig Interpolation (on an execution trace)

To Be Precise : Regression Aware Debugging

}

A

}

B

Negated Output Constraint : (a != 1)

(Passing Testcase)

Input Constraint : (c == 1)

12 of 32

Regression Score

  • Min number of relaxed clauses =

No. of relaxed statements (ns) + No. of relaxed roadblocks (nr)

  • Regression Score = f (ns, nr)
    • Lower the score, less likely it is for a change to induce regression (more suspicious is the location)

To Be Precise : Regression Aware Debugging

13 of 32

Motivating Example

To Be Precise : Regression Aware Debugging

Traffic Collision Avoidance System (Part of Siemen’s Benchmarks)

Artificially injected faults to create ~40 buggy versions

14 of 32

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Failing Test :

Own_Tracked_Alt == Other_Tracked_Alt

15 of 32

How does BugAssist perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

16 of 32

How does Tintin perform?

IIT Kanpur

To Be Precise : Regression Aware Debugging

17 of 32

Motivating Example

IIT Kanpur

To Be Precise : Regression Aware Debugging

Passing Tests (Truncated)-

  1. Own_Tracked_Alt = 500

Other_Tracked_Alt = 424

Output : 1

2) Own_Tracked_Alt = 560

Other_Tracked_Alt = 601

Output : 0

...

(Own_Tracked_Alt > Other_Tracked_Alt)

return 0;

(Own_Tracked_Alt < Other_Tracked_Alt)

return 1;

18 of 32

Proof Breakage != Test Failure (Regression)

To Be Precise : Regression Aware Debugging

19 of 32

Implementation

To Be Precise : Regression Aware Debugging

  • Built on top of the CBMC model checker (Clarke et al. ‘04)
    • Constraint Based Model Checker
    • Verification Condition
    • preCondition && ProgramFormula && !PostCondition
  • MathSAT for interpolation (Griggio et al. ‘13)
  • MsUncore for Partial MaxSAT (Marques-Silva et al. ‘13)

20 of 32

Evaluation

To Be Precise : Regression Aware Debugging

  • RQ 1 : Does regression awareness improve the results for bug localization?
    • RQ 1a : With respect to the ground-truth location/repair
    • RQ 1b : With respect to all repairable locations

  • RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?

21 of 32

RQ 1a (w.r.t Ground-Truth location)

To Be Precise : Regression Aware Debugging

  • Assumptions and Methodology :
    • Assume the developer goes in decreasing order of suspiciousness
    • For (Re)BugAssist, score is 1 for every location (random ordering)
    • For Tintin, random ordering between locations with equal score
    • Representative set of passing tests used (To prevent blow-up)
  • Quality measured by number of locations required to be examined before reaching ground-truth location/repair
  • Based on developer effort saved (w.r.t BugAssist)

22 of 32

RQ 1a

  • RQ 1 : Does regression awareness improve the results for bug localization? (reduction in developer effort)
    • Best-Case :
      • TCAS - 39%
      • Larger Programs (Siemens) - 60%
      • SV-COMP and Cascade - 65%
    • Average-Case :
      • TCAS - 15%
      • Larger Programs (Siemens) - 50%
      • SV-COMP and Cascade - 61%

23 of 32

RQ1b (w.r.t all Repairable locations)

To Be Precise : Regression Aware Debugging

  • For a given set of passing tests and the failing test under inspection -
    • The ground-truth might not be the only viable repair location
    • Many possible locations where repairs can be introduced without causing regression
  • Does our algorithm rank most of these viable locations high (most suspicious)?

24 of 32

RQ1b

To Be Precise : Regression Aware Debugging

  • Use a repair tool Angelix (Mechtaev et al. ICSE16) to generate repairs (approximation)
  • Two configurations -
    • All possible grammars (Strongest setting) - Timeout of 20 minutes
    • Small grammars : No Timeout
  • ALL passing tests used (~1000)
  • Plot the rankings of these repairs

25 of 32

Evaluation Approach 2

To Be Precise : Regression Aware Debugging

75% of crosses lie below the 60% mark

26 of 32

RQ2 : Overhead

To Be Precise : Regression Aware Debugging

  • RQ 2 : How much does regression awareness cost (in terms of number of clauses and time) for bug localization?
    • For TCAS :
      • Formula blow-up : 1.33x
      • Slow-down : 4.5x

27 of 32

What about Program Repair

To Be Precise : Regression Aware Debugging

  • I’m lazy, I want repairs, not localizations
  • ReDirectFix based on the DirectFix algorithm (Mechtaev et al. ‘15); however, restricted to classes of repairs (off-by-one etc.)
  • SMT solver searches for possible mutations that fix all tests --- regression freedom
  • However, formula can blowup!
  • Can we use regression awareness?

28 of 32

Tintin - A compromise

  • We don’t guarantee regression freedom, but we deal with smaller formulas
  • Use interpolants as preconditions and postconditions for the statement under consideration for repair (via grammar)

INTERPOLANT PRECOND - (a > b)�if (choice == 1) {� selection = b + a;�} else if (choice == 2) {� selection = b - a;�}...�INTERPOLANT POSTCOND - (selection > 0)

29 of 32

Repair in Tintin

To Be Precise : Regression Aware Debugging

  • RQ 1 : Does using regression awareness instead of regression freedom degrade results?
    • The ground-truth repair was ranked the highest in more than 70% of the programs in our experiments
  • RQ 2 : What is the impact on formula size and solving time?
    • Reduction of around 43% on formula sizes
    • We attained speedups between 1.3x to 6.5x

30 of 32

Future Work

  • Modification for performance-related regression bug

  • Massive parallelization of formula checking

  • Scaling for larger program

31 of 32

Summary

To Be Precise : Regression Aware Debugging

  • A novel technique of using Craig Interpolants as summaries of passing tests to improve Bug Localization

  • A potential application of using interpolants to speed up program repair without compromising much on quality

32 of 32

Thank you!

Questions?