1 of 17

Finding Polluter Tests Using Java PathFinder

Pu Yi Anjiang Wei Wing Lam

Tao Xie Darko Marinov

lukeyi@pku.edu.cn

2 of 17

What are Flaky Tests?

A test is flaky if it can nondeterministically pass or fail for the same code

Misleads developers to debug nonexistent faults in recent changes
Reduces trust in tests

3 of 17

What are Order-Dependent Tests?

Order-dependent tests are a prominent category of flaky tests

A polluter test (testInc)
A victim test (testGet)

The order of unit tests is not fixed (e.g., in JUnit), so the victim test appears flaky

testGet, testInc => both pass
testInc, testGet => testGet fail

4 of 17

Why Look for Polluter Tests?

Polluter tests are tests that modify shared state among tests

Detecting (and fixing) polluter tests is important to prevent victim tests from failing

5 of 17

Prior Work

PolDet¹ -- Pol(luter)Det(ector)

Detect polluter tests in regular testing on Java Virtual Machine
Run each test, capture and compare the shared pre-state and the post-state

Shared state includes the part of the heap reachable from the static (class) fields (in JUnit 4 -- in JUnit 5 and TestNG can also include the test instance shared across test runs)
Also compare parts of the file system

1. Gyori et al. "Reliable testing: detecting state-polluting tests to prevent test dependency." ISSTA 2015

6 of 17

PolDet’s Original Implementation

Identify the static fields

Use a Java agent to track all loaded classes

Capture state via serialization

Traverse the relevant part of the heap
Use the XStream library for XML serialization

Compare states

Use the XMLUnit library for XML comparison

7 of 17

Challenge for PolDet

Lazy class loading

Java programs dynamically discover what classes are needed and load them only when needed
Pre-state and post-state can trivially differ because of different static fields when the test execution loads a new class
Could lead to reporting false positives

8 of 17

How to Avoid False Positives?

Common-root isomorphism

View pre-state and post-state as multi-rooted graphs

Nodes represent heap objects (including arrays) and primitive values
Edges represent object fields (including array indexes)
Graph roots correspond to the static fields

Find the set of common roots for the two graphs
Compare whether the subgraphs reachable from these common roots are isomorphic (up to the identity of nodes)

12 of 17

How to Implement PolDet in JPF

Our primary motivation for this work was to learn more about JPF

Also wanted to compare JPF implementation with the original implementation

JPF already serializes the entire JVM state

Roots

Static class fields (ClassLoaders)
Stack frame for each thread (StackFrames)
Thread/lock monitor status (ThreadStates)

Heap

Objects reachable from these roots

For PolDet, need to serialize only objects reachable from static fields

13 of 17

Our Implementation of PolDet in JPF

Developed a new customized state comparison in JPF

Support common-root isomorphism by ignoring newly loaded classes in the test
To further support JUnit, ignore irrelevant fields when serializing the state (e.g., JUnit variables, caches)

Wrote a simple JUnit listener

To call our code to capture the pre-state and the post-state, and to compare the states

Wrote a total of ~200 LoC

Much smaller than the original implementation

14 of 17

Experiments

Applied on 189 test classes from 14 open-source Java projects

Projects from the original PolDet paper

Found 33 polluter tests
Measured overhead

Average 1.25x

15 of 17

Example Real Polluter Test

One polluter test we found in project spark uses powermock-reflection to modify a static field

modifies static field ExceptionMapper.servletInstance

16 of 17

Some Lessons Learned from Our Work

Pros

JPF made it simple to implement PolDet (~200 LoC)
Low overhead of PolDet@JPF compared to base JPF (1.25x)

Cons

JPF can run only some real test classes (189 / 1005 = 18.8%)
PolDet@JPF may have false negatives (ignoring roots, as PolDet)

Future work

Track and report the name of modified fields, visualize results
Add user configuration options (e.g., specify fields to ignore)
Evaluate false negatives by eager class loading

17 of 17

Some Points for Discussion

How could JPF run more real code?

Missing native peers are the biggest problem
How to prioritize which peers could help the most?
Some incomplete peers lead to bugs later on (e.g., return null)

When will the CFSerializer bug be fixed?

There was a report on the JPF mailing list in early August
Our implementation builds on FilteringSerializer�(and CFSerializer is a subclass that could run faster)

How could we help JPF community?

E.g., would visualizing state comparison be interesting to people?

Ran only 189 of 1005�test classes we tried