CS259 Summary Notes

© 2012 Chris McGinlay CC BY 3.0

https://plus.google.com/110075889815392203358 

Licence: Creative Commons Attribution 3.0 Unported

1. How Debuggers Work

Idea of debugging by conducting a systematic experiment to establish cause(s) of failure.

Input ->

Failed Program ->

Compare expected and actual outputs for given inputs ->

Find defect causing failure.

Illustrated with the ‘fruit fly’ model - an HTML stripper implemented via a finite state machine.

Devil’s Guide to Debugging - Simplistic Debugging. DON’T.

Definitions

Defect - An error in code. Preferable to the term ‘bug’ which implies that something crawled in when you weren’t looking, whereas in fact the programmer put the defect there in the first place.

Failure - an externally visible error.

Error - a deviation from correct behaviour.

Infection - error in program state, propagating forward to successive states.

Program - succession of states represented by a state vector (of variables, objects etc).

A defective statement causes a change of state, such that a sane input state becomes an insane state. Marked by an infection with no precursor infection.

Possible: defect -> infection -> failure

Sometimes: defect -> infection -> no failure

However a failure can always be traced back to a defect via cause-effect chain. This can be HARD; tens of thousands of variables, tens of thousands of intermediate states.

Scientific Method

Posit a hypothesis. (Something you expect to happen, or think may be happening to program state). Devise an experiment to test this hypothesis. E.g. use ‘assert False’ to mark code which should never be reached (such as an undefined branch of if..elif..else).

Hypotheses -> Prediction -> Experiment -> Observation.

Observation will result in either confirmation or rejection of hypothesis. So either you refine or reject your hypothesis.

Finally, if you have established cause of defect ensure you

  1. prevent re-occurrence of defect
  2. look for same essential error elsewhere in code

Explicit Debugging...

...as opposed to keeping the whole situation in your head.

Keep a written log of what you’ve tried. No need to memorise. Easy to interrupt work and resume later. The act of writing down structures forces thinking to take place and can reveal solution. Similarly, talk to someone about the problem. Act of explaining forces you to understand the situation and structure your thoughts prior to opening your mouth.

Exercise to illustrate is SPYDER - Simple PYthon DEbuggeR. Using sys.settrace(traceit).

Arguments to traceit are frame (stack frame), event (including line, call, return), arg (locals, retval).


2. Assertions

Assertions provide automatic runtime error checking. Provide key to automation. Topic looks at writing assertions, manually and automatic inference of appropriate assertions.

assert condition         -> true: OK, proceed with program

                        -> false: Interrupt execution, raise exception.

Are asserts available in your language? If not, roll-your-own. Built in asserts typically:

Example:

Assertions do not yield location of defect.

Assertions can be used in tests (such as unit tests) or used in code as specification (for function args and return values).

Function Pre- and Post- Conditions

Asserts as a specification. Precondition: checking arguments to functions at start of execution of function. Postcondition: checking return value just prior to returning it.

In cause-effect chain, asserts automatically check large portions of states, aiding in identifying defect. Earlier detection results in shorter cause-effect chain.

You can wrap function call for function being tested in try:except block to catch exceptions and provide diagnostic output. (Example discussed of square_root postcondition)

Lifetime of Assertions

Catching errors. Testing (pre and post conditions, try:except diagnostics). Documentation (precondition=spec).


Refining Assertions

Pre condition assertions

Post condition assertions

Too permissive

arbitrary behaviour

will miss bugs

Too strict

valid inputs excluded

false positives

Data Invariants

Using __repr__() internal method.

Example: wrong input type can induce ValueError exception. This is a TIMEBOMB in python. Assertions can be used to mandate valid data when objects are instantiated or functions called. E.g. assert 1<=day<=31. Place such assertions in __init__() , possibly in a checkRep() helper function.

Invariant is a condition that always holds for an object as viewed by the user. (Invariance not guaranteed during internal operations).

By developing a checkRep() helper function, call it in object constructor or just before function return in a method modifying internal state.

Example of usefulness is the Red-Black tree structure. This structure is good for search, insert, delete (log time). Insertion and removal of nodes DIFFICULT: many cases.

Invariants:

If implementing such a structure (don’t: use a library instead), first thing you write is an invariant checker.

In the Eiffel language, invariants are a builtin language feature, defined in class definitions and checked in pre and post conditions in public methods.


Assertions are Optional...

...and hence must not change internal behaviour of program.

Why would you turn ‘em off? Assertions can take time, hit performance. Test this, maybe turn off expensive asserts in production code. Leave the cheap ones.

In particular don’t use asserts in checking public preconditions: 3rd party data input, SECURITY hazard as can be turned off. Use non-optional input validation and raise your own exceptions or pop up error boxes in webapp etc.

Using Asserts in Production

+better to fail with an assertion than continue with bad data.

+eases debugging

+helps prevent defect ever reaching the field

+helps reveal info about the failure if defect does get to the field

-performance hit, selectively disable expensive asserts

?aborted execution not user friendly. However alternative is bad data and arbitrary behaviour.

System Invariants

These are properties holding throughout execution, such as flags or maybe buffer allocations. Can use sys.settrace on each line of code execution to see violation of this invariant. Could call a checkRep() helper function.

Memory buffer overflow. System invariant would be checking C style array boundaries, protecting memory. Packages such as Electric Fence or valgrind.

Adding Assertions in Existing Code

  1. Define data invariants. Write checkRep().
  2. Pre-condition assertions.
  3. Post condition assertions.
  4. Deploy system invariant checker such as valgrind.

1&2 are easy to implement, so do them first. 3 is more worthwhile once 1 and 2 are done since we know data input should be sane!

DAIKON tool can infer invariants from repeated program executions. (DAIKON, Michael Ernst et al).


3. Simplifying Failures

“After removing some part of a failure inducing input, the failure no longer occurs”

        -> the bit you just removed includes the failing input

Idea is to reduce and simplify large failure inducing inputs.

In a bug report - what is relevant? Condense to a test case in which every part if necessary to induce failure.

Automate Process of Simplification - DDMIN

This requires a strategy and an automatic test. Simplest strategy is binary search (recursive, split + test). The automatic test must test whether the reduced input passes or fails (does nor does not trigger failure).

Binary search will often leave non-essential parts included with reduced input. Introduce delta debugging instead: ddmin(). Algorithm:

  1. split input into n (initially n=2) subsets.
  2. if removing a subset causes test failure, n->max(n-1,2), proceed with failing subset,
  3. otherwise increase granularity n->min(2n, length(input))

It is of course useful if your implementation of ddmin can take a generic test function as an argument.

It is also efficient if your test function can take advantage of domain knowledge to intelligently leverage the input structure, splitting into list of elements. Test function needs to recombine list into entity for testing. (In example given, achieved via python list .join())

Fuzzers

Generate random input data. Good way to test your software. Used as an attack vector by the bad guys to generate failures.

DDMIN Applied to Other Domains

Describing the “yesterday it worked” problem. Someone has updated a library, or applied a patch. Now your code fails. Question is which patch or sub-patch or library? DDMIN can help, except the testing algorithm is intensive: apply diffs, build, execute. Requires automated build process and CPU intensive. Popular tool is git bisect which can do this.


Causes and Causality

DDMIN returns a failure cause. Using a counterfactual definition of causation.

Cause. Suppose there are events A, B with A preceeding B. A causes B if (B does not occur unless A occurs)

When fixing a defect, you want to be sure that:

  1. defect really is an error (show how to fix it)
  2. the defect caused the failure (failure no longer occurs after the fix)

Some errors are not causes of a failure. (May be unrelated)

Some causes are not errors. (Cause may be legal input).

You want to find errors which are causes.

There can be many causes. Frivolous ones, such as your existence is a cause.

Actual cause A - assumes and changes as little as possible, yet A till changes the event B.

sys.settrace(frame, event, arg)

Python sys module. settrace specifies system trace function. “The local trace function should return a reference to itself (or to another function for further tracing in that scope), or None to turn off tracing in that scope.” - http://docs.python.org/2/library/sys.html#sys.settrace

frame - current stack frame

frame.f_code.co_name - function name

frame.f_code.co_filename - source code filename

frame.f_code.co_varnames - local variable names in tuple

frame.f_locals - dictionary of local variables

event - string denoting current action

call, return, line, exception, c_call, c_return, c_exception

arg -

None, return value, exception tuple, function object depending on context.


4. Tracking Origins

Goal is to obtain a cause-effect chain. Perhaps you can observe it happening. If you can’t observe it directly, you can reconstruct it from deductive reasoning (as in the Sherlock Holmes examples).

Consider propagation of errors, trace back dependencies. There are automated deduction tools such as PEX (try it online at www.pexforfun.com). This finds inputs that cause assertions to fail. Code can be C#.

Any proof of a fix is only as good as the specification. Other, novel inputs may cause different failures to the one you’ve fixed. (Good assertions help here).

At least two ways to fix the example, either

  1. introduce a quote type flag via enum-like class. class QuoteStyle: NONE, SINGLE, DOUBLE = range(3)
  2. simply record the actual quote character in the quote flag and test it.

Generally, follow data and/or control dependencies to assist with deductive reasoning.

Data Dependency

Statement B is data dependent on statement A if statement A write a variable which B later accesses, without another statement writing that variable in between.

NB: B has no data dependency on anything if B reads no variables!

Control Dependency

B is control dependent on A if statement A determines whether statement B gets executed or not.

Backward Slice of Statement S

This is the set of statements that S would transitively depend on. (Not entirely sure what ‘transitive’ means in this context)

NB: Anything not in the backward slice cannot influence S itself or influence whether S is reached. In this way, backward slice acts as a filter on what code you need to look at.

Forward Slice of Statement S

This is the set of all statements that transitively depend on S.

There exist slicing tools that can find forward and backward slices automatically.


Dynamic Slices

These apply to actual executions rather than to the programs themselves. They describe what has happened as opposed to what could happen. They are based on program traces:

Execute program => obtain trace => dynamic slice

They contain only the executed lines, only having statements which influence the output. Typical coverage tools reveal around 7% of code lines are covered in typical executions, whereas dynamic slice tools reveal around 2.8% of lines can actually influence output.

In practical terms, in trace function, look for event==”line”. Can then read control and data dependency from these covered lines. (This is the slice). Hopefully will reveal cause-effect chain.

Focus on Likely Origins

Having obtained a set of possible causes (origins) for a failure, use scientific method or deductive reasoning to test for actual origin, or eliminate non-relevant origins respectively.

Where to start if there are a large number of possible origins? In order or best place to start:

  1. look for infections
  2. look at causes - got a known state that causes failure?
  3. look at code that ‘smells’. Maybe compiler warnings.
  4. look at project bug history for buggy code
  5. look at most recently changed code
  6. look for anomalies such as log entries.

Auto Cause Effect Chain

Essentially, we use ddmin() to establish minimum failing state. If you can change variables such that failure goes away, then you have found failure cause.

Use values from a successful run to make changes during a failing run (oft underused feature of debugging tools). ddmin() will find the minimum set of such changes required.

Only thing we have manually supplied at this point is the location and iteration to test. See later.


5. Reproducing Failures

Need to be able to reproduce failing runs - in order to observe behaviour and check that a fix actually works.

Bug Categories

Bohr Bug: reliably repeatable under well defined conditions.

Heisenbug: disappears or alters behaviour as you explore it.

Mandelbug: appears chaotic or non-deterministic.

Schrödingbug: does not manifest until someone realises the program should never have worked in the first place.

What to Reproduce?

Real world program inputs include:

Reproducing User Interaction

Implement via a ‘capture-replay’ tool: layer interposed between the user and the program. Implemement by intercepting library or system function calls handling user input. For GUI record, don’t record mouse movements etc as this is highly variable from host to host. Instead record function calls and parameters on GUI objects. Selenium tool provides capture/replay for web projects. Risks with recorded ‘scripts’ are sensitive data exposure, changes to environment of GUI rendering recorded scripts redundant.

Implementation of simple tool in Python: two modes ‘capture’, ‘replay’. In capture, trace function interecepts call events, storing function name and arguments in a list. Replay mode uses Python eval function to trigger reconstructed function calls.


Statistical Debug: Coverage and Correlation with Failure

Find features of execution statistically correlating with failure. For example, could be that function f is always called when failure occurs, never called in passing run, or could be a particular line is always executed, or a function give a particular return value etc.

Idea is to collect whatever execution features may be of interest and calculate correlation co-efficient with failing or passing runs. There are various ways of visualising the coverage/correlation results.

Phi Co-efficient

Pearson correlation co-efficient is suitable for linear dependent variables.

For binary correlation, use Phi co-efficient, also developed by Pearson.

Line coverage example. Line covered or not covered - how does this correlate with PASS or FAIL. Deduce phi ϕ for each line in program or function.

first c/nc↓  second p/f

PASS

FAIL

covered

n11

n10

n1x

not covered

n01

n00

n0x

⃡⅀

nx1

nx0

n=nx1+nx0=n1x+n0x

n is the total number of executions.

ϕ = n11 n00 - n10 n01 / normaliser

Where normaliser = sqrt(nx1 nx0 n1x n0x)

-1≤ϕ≤1

+1 = maximum +ve correlation with failure, 0 = no correlation.

Function Correlation

Can compute phi for function coverage and return value. Usually helpful to categorise return values, in example given as [-1,0,1] with suitable definitions for non-integer return values and exceptions. Correlate each category and its complement (remember this is a binary correlation!)

(-1 or not -1) with (pass/fail),

(0 or not 0) with (pass/fail),

(+1 or not +1) with (pass/fail).

I.e. 3 phi values for each function.

6. Learning from Mistakes

Managing Bugs

Track and organise debugging efforts, aids in preventing re-occurence, investigation of where and why bugs are appearing in software.

Bug reports vary in quality. Can program in facilities to gather vital information automatically during a crash. Need to know how to reproduce failure (1 most useful)

  1. Steps to reproduce (problem history, what users was were doing)
  2. Diagnostics: core dumps, stack frames, logs.
  3. User experience: what the user saw (e.g. preview crashed)
  4. User expectation: what they thought would happen.
  5. One line summary: forms the basis for search of bug database.

Store bugs. Ensure each bug is handled in finite time. Problem database can be huge such as bugzilla.

Bug Status and Lifetime

Fixed, WONTFIX etc.

Severity - impact on user. Blocker, Critical, ...

Notification - who is to be told about fixes etc?

Unconfirmed -> New -> Assigned -> Resolved* -> Verified -> Closed

Resolved can be Fixed, Duplicate, Invalid, WONTFIX, worksforme etc.

Closed -> reopened -> assigned is possible.

Resolved -> reopened -> assigned is also possible.

Duplicate bugs needs management. Mark obsolete problems to maintain morale and reduce clutter.

Defect Map

Given that software changes will be stored in a database (e.g git repo or some other version control) it is a good idea to link patches to bug id in bug database. (“Checkin closes bug #567, lines #45,300-320”).

With this, can map out problems in the codebase and generate defect counts for modules. Can identify insecure packages .

Pareto Principle

20% of modules contain 80% of bugs.

Origin Of Bugs

Some observations and interesting results of where bugs come from.

Study your own project defect map as these generalities won’t capture all correlations with bug density.

7. Seven Steps of Debugging

Track problem

Reproduce problem

Automate and simplify (test asserts, ddmin)

Find possible infection origins

Focus on most likely origin (stastical debugging)

Isolate infection chain - develop hypotheses

Correct defect, then verify fix.

Avoid Debugging!

Get initial requirements right. Spec differentiates failure from success.  Use automation to find bugs. Develop and automate testing.

Reduce program complexity.

Set up assertions. Test early, test often.

Review your code, analyse problem history.

Finally,

Enjoy!

Thanks Andreas, I did. Great course :-)