Preempting Flaky Tests �via Non-Idempotent-Outcome Tests
Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, Wing Lam
1
Funding acknowledgments
CCF-1763788�CCF-1956374
62161146003
2
Developer Anecdote
Servers
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
4:15 PM
test0
test1
test2
testn
…
Build code
Run tests
3
Developer Anecdote
Servers
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
4:15 PM
Merge Changes
Pass
test0
test1
test2
testn
…
Build code
Run tests
4
Developer Anecdote
Servers
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
4:15 PM
Fail
Debug Changes
test0
test1
test2
testn
…
Build code
Run tests
5
Developer Anecdote
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
Servers
Servers
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
Build code
Run tests
Build code
Run tests
6
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
Servers
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
…�- static int add() {
+ static int add(r) {
- ts.addRow(“”);
+ ts.addRow(r);� return ts.size();
…
Servers
Build code
Run tests
Build code
Run tests
Developer Anecdote
Servers
Servers
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
Developer wastes time �debugging & running tests �and goes home �1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can �non-deterministically �pass and fail when run on the same code version
7
…�- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);� return db.size();
…
Servers
…�- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);� return db.size();
…
…�- static int add() {
+ static int add(r) {
- db.addRow(“”);
+ db.addRow(r);� return db.size();
…
Servers
Servers
4:15 PM
5:00 PM
5:30 PM
6:15 PM
Servers
Servers
Developer wastes time �debugging & running tests �and goes home �1 hour and 15 min later
1 hour
15 min
Flaky Test: a test that can �non-deterministically �pass and fail when run on the same version of the code
Public Outcry About Flaky Tests
What are Flaky Tests?
8
Background: Victim and Polluter
9
// shared variable x is initialized to 0�void t1() { assert x == 0; } // victim�void t2() { x = 1; } // polluter
TestOrder1
t1
t2
TestOrder2
t2
t1
Background: Latent-Victim, Latent-Polluter
10
// shared variables x, y, z are initialized to 0�void t1() { assert x == 0; } // victim�void t2() { x = 1; } // polluter�void t3() { assert y == 0; } // latent-victim�void t4() { z = 1; } // latent-polluter
Non-Idempotent-Outcome (NIO) Test
11
// shared variables x, y, z, w are initialized to 0�void t1() { assert x == 0; } // victim�void t2() { x = 1; } // polluter�void t3() { assert y == 0; } // latent-victim�void t4() { z = 1; } // latent-polluter�void t5() { assert w = 0; w = 1;} // NIO
Why should we detect NIOs?
12
1 Gyori et al., “Reliable testing: Detecting state-polluting tests to prevent test dependency”. ISSTA 2015
2 Huo and Clause, “Improving oracle quality by detecting brittle assertions and unused inputs in tests”. In FSE 2014
Contributions
13
Real Example of NIO
def cmd_mock():
def _cmd_mock(name: str):
cmd.__overrides__[name] = [‘/bin/true’]
yield _cmd_mock
- cmd.__overrides__ = []
+ cmd.__overrides__ = {}
def test_slurm_command(tmp_path, cmd_mock):
cmd_mock('srun')
14
Buggy Cleaning Code
TypeError: list indices must be integers or slices, not str
Real Example of NIO
15
def to_zero(tvd, northing, easting,� surface_northing, surface_easting):
# perform some checking
- northing -= surface_northing
- easting -= surface_easting
+ northing = northing - surface_northing
+ easting = easting - surface_easting
return tvd, northing, easting
# initialization for global variables: g1,…,g5
g1 = ...
def test_zero():
# global variables passed in as arguments
v1, v2, v3 = to_zero(g1, g2, g3, g4, g5)� np.testing.assert_equal (...) # assertion
Fix: Avoid Function Side Effect
AssertionError: �Mismatched elements: 121 / 121 (100%)
Prevalence of NIO Tests
Conclusion:
16
| Java | Python |
# Test Suites (total) | 127 | 1006 |
# Test Suites w/ NIO | 34 | 138 |
% Test Suites w/ NIO | 26% | 9% |
# NIO Tests | 223 | 138 |
Different Detection Modes
17
TestClass A
t1
t2
TestClass B
t3
Test Suite
Experience with Fixing NIO Tests
18
NIO vs. Polluter vs. Victim
19
Conclusions
20
Questions? Email: Anjiang Wei <anjiang@stanford.edu>