1 of 17

Finding Polluter Tests Using Java PathFinder

Pu Yi Anjiang Wei Wing Lam

Tao Xie Darko Marinov

lukeyi@pku.edu.cn

2 of 17

What are Flaky Tests?

  • A test is flaky if it can nondeterministically pass or fail for the same code
    • Misleads developers to debug nonexistent faults in recent changes
    • Reduces trust in tests

2

3 of 17

What are Order-Dependent Tests?

  • Order-dependent tests are a prominent category of flaky tests
    • A polluter test (testInc)
    • A victim test (testGet)
  • The order of unit tests is not fixed (e.g., in JUnit), so the victim test appears flaky
    • testGet, testInc => both pass
    • testInc, testGet => testGet fail

3

4 of 17

Why Look for Polluter Tests?

  • Polluter tests are tests that modify shared state among tests

  • Detecting (and fixing) polluter tests is important to prevent victim tests from failing

4

5 of 17

Prior Work

  • PolDet1 -- Pol(luter)Det(ector)
    • Detect polluter tests in regular testing on Java Virtual Machine
    • Run each test, capture and compare the shared pre-state and the post-state
      • Shared state includes the part of the heap reachable from the static (class) fields (in JUnit 4 -- in JUnit 5 and TestNG can also include the test instance shared across test runs)
      • Also compare parts of the file system

5

1. Gyori et al. "Reliable testing: detecting state-polluting tests to prevent test dependency." ISSTA 2015

6 of 17

PolDet’s Original Implementation

  • Identify the static fields
    • Use a Java agent to track all loaded classes
  • Capture state via serialization
    • Traverse the relevant part of the heap
    • Use the XStream library for XML serialization
  • Compare states
    • Use the XMLUnit library for XML comparison

6

7 of 17

Challenge for PolDet

  • Lazy class loading
    • Java programs dynamically discover what classes are needed and load them only when needed
    • Pre-state and post-state can trivially differ because of different static fields when the test execution loads a new class
    • Could lead to reporting false positives

7

8 of 17

How to Avoid False Positives?

  • Common-root isomorphism
    • View pre-state and post-state as multi-rooted graphs
      • Nodes represent heap objects (including arrays) and primitive values
      • Edges represent object fields (including array indexes)
      • Graph roots correspond to the static fields
    • Find the set of common roots for the two graphs
    • Compare whether the subgraphs reachable from these common roots are isomorphic (up to the identity of nodes)

8

9 of 17

9

10 of 17

10

11 of 17

11

12 of 17

How to Implement PolDet in JPF

  • Our primary motivation for this work was to learn more about JPF
    • Also wanted to compare JPF implementation with the original implementation
  • JPF already serializes the entire JVM state
    • Roots
      • Static class fields (ClassLoaders)
      • Stack frame for each thread (StackFrames)
      • Thread/lock monitor status (ThreadStates)
    • Heap
      • Objects reachable from these roots
  • For PolDet, need to serialize only objects reachable from static fields

12

13 of 17

Our Implementation of PolDet in JPF

  • Developed a new customized state comparison in JPF
    • Support common-root isomorphism by ignoring newly loaded classes in the test
    • To further support JUnit, ignore irrelevant fields when serializing the state (e.g., JUnit variables, caches)
  • Wrote a simple JUnit listener
    • To call our code to capture the pre-state and the post-state, and to compare the states
  • Wrote a total of ~200 LoC
    • Much smaller than the original implementation

13

14 of 17

Experiments

  • Applied on 189 test classes from 14 open-source Java projects
    • Projects from the original PolDet paper
  • Found 33 polluter tests
  • Measured overhead
    • Average 1.25x

14

15 of 17

Example Real Polluter Test

One polluter test we found in project spark uses powermock-reflection to modify a static field

15

modifies static field ExceptionMapper.servletInstance

16 of 17

Some Lessons Learned from Our Work

  • Pros
    • JPF made it simple to implement PolDet (~200 LoC)
    • Low overhead of PolDet@JPF compared to base JPF (1.25x)
  • Cons
    • JPF can run only some real test classes (189 / 1005 = 18.8%)
    • PolDet@JPF may have false negatives (ignoring roots, as PolDet)
  • Future work
    • Track and report the name of modified fields, visualize results
    • Add user configuration options (e.g., specify fields to ignore)
    • Evaluate false negatives by eager class loading

16

17 of 17

Some Points for Discussion

  • How could JPF run more real code?
    • Missing native peers are the biggest problem
    • How to prioritize which peers could help the most?
    • Some incomplete peers lead to bugs later on (e.g., return null)
  • When will the CFSerializer bug be fixed?
    • There was a report on the JPF mailing list in early August
    • Our implementation builds on FilteringSerializer�(and CFSerializer is a subclass that could run faster)
  • How could we help JPF community?
    • E.g., would visualizing state comparison be interesting to people?

17

Ran only 189 of 1005�test classes we tried