第 1 页,共 277 页

Welcome Back!

第 2 页,共 277 页

Developer Experience, FTW!

Niranjan Tulpule

第 3 页,共 277 页

Software development is being democratized

第 4 页,共 277 页

Core computing platforms are more accessible than ever

PCs

Smartphones & Tablets

2014

1.5M

1983

0

第 5 页,共 277 页

Free developer tools

Open Source building blocks

Free developer education

We’re lowering barrier to becoming a developer

第 6 页,共 277 页

2015

2016

2017

2018

2019

2020

2014

2013

2012

2011

Total Number of Active Apps in the App Store

2010

It’s never been easier to write apps

5M

3M

1M

第 7 页,共 277 页

Valuation of these 9 companies as a country's GDP would be in the top 50

  • Uber: $66B
  • Snapchat: $40B
  • Whatsapp: $16B
  • Airbnb: $25B
  • Flipkart: $15B
  • Pinterest: $11B
  • Lyft: $5.5B
  • Ola Cabs: $5B
  • Gojek: $1.3B

Valuation of these companies as a country's GDP would be in the top 50

第 8 页,共 277 页

51%Stability Issues

41%Functionality Related

7%Speed

1%Other

Classification of 1-star reviews (Sampling of Play Store reviews, May 2016)

Writing high quality apps is still hard

第 9 页,共 277 页

2.5K+

100+

~700

100M

Compounded complexity

Device

Manufacturer

model

OS versions

Carriers

Permutations

第 10 页,共 277 页

Improving software quality & testability by investing in Developer Experience.

第 11 页,共 277 页

Develop

Release

Monitor

Firebase Test Lab

for Android

第 12 页,共 277 页

Test on your users’ devices

第 13 页,共 277 页

Use with your existing workflow

Ahmed to place product shot here

>

_

Android Studio

Command line

Jenkins

Jenkins logo by Charles Lowell and Frontside CC BY-SA 3.0 https://wiki.jenkins-ci.org/display/JENKINS/Logo

第 14 页,共 277 页

Robo crawls your app automatically

第 15 页,共 277 页

Create Espresso tests by just using your app

Ahmed to place product shot here

第 16 页,共 277 页

Millions of Tests, and counting!

After extensive evaluation of the market, we've found that Firebase Test Lab is the best product for writing and running Espresso tests directly from Android Studio, saving us tons of time and effort around automated testing.

- Timothy West, Jet

第 17 页,共 277 页

Actionable Results at your fingertips

Get actionable results at your fingertips

Develop

Release

Monitor

Firebase Test Lab

for Android

Play Pre-Launch Report

第 18 页,共 277 页

Pre-launch report

Pre-launch reports summarize issues found when testing your app on a wide range of devices

第 19 页,共 277 页

第 20 页,共 277 页

第 21 页,共 277 页

第 22 页,共 277 页

Apps using the Play Pre-Launch Report show ~20% fewer crashes!

~60% of the crashes seen on Pre-Launch Report are fixed before public rollout.

第 23 页,共 277 页

Actionable Results at your fingertips

Get actionable results at your fingertips

Develop

Release

Monitor

Firebase Test Lab

for Android

Play Pre-Launch Report

Firebase Crash Reporting

第 24 页,共 277 页

Firebase Crash Reporting

Get actionable insights and comprehensive analytics whenever your users experience crashes and other errors

第 25 页,共 277 页

  • Integrate Gradle/Pod

  • 0-1 init lines of code

  • Start capturing errors!

第 26 页,共 277 页

fatal error A

6K

7K

non-fatal error A

5K

6K

fatal error B

4K

4.8K

fatal error C

3K

3K

Clustering

第 27 页,共 277 页

第 28 页,共 277 页

Get the big picture with comprehensive metrics on app versions, OS levels and device models

第 29 页,共 277 页

Find the exact line where the error happens

第 30 页,共 277 页

Minimize the time and effort to

resolve issues with data about your users’ devices

第 31 页,共 277 页

Log custom events before an error happens

//On Android

FirebaseCrash.log("Activity created.");

//On iOS

FIRCrashLog(@"Button clicked.");

第 32 页,共 277 页

Provide more context with events leading up to an error

第 33 页,共 277 页

Understand the Impact of Crashes on the Bottom Line

Confidential + Proprietary

第 34 页,共 277 页

Fix the bug, then win them back with

a timely push notification

Confidential + Proprietary

第 35 页,共 277 页

Looking ahead

Machine learning

Compilers

Toolchains

第 36 页,共 277 页

The shift to mobile caught us by surprise...

PCs

Smartphones & Tablets

2014

1.5M

1983

0

第 37 页,共 277 页

Thank You

第 38 页,共 277 页

Docker Based Geo Dispersed Test Farm �- Test Infrastructure Practice in Intel Android Program

Chen Guobing, Yu Jerry

38

第 39 页,共 277 页

Agenda

  • Test Infrastructure Challenges
  • Test as a Service
  • Docker Based Test Farm
  • Test Distribution
  • Technical Challenges
  • Questions

39

第 40 页,共 277 页

Taxonomies

40

第 41 页,共 277 页

Test Infrastructure Challenges

  • Maximize the use of Development Vehicles (Engineering samples)
  • Maximize the use of automated test
  • Minimize the maintenance cost of the Test Infra, test benches and test assets

41

第 42 页,共 277 页

Test as a Service – What We Need

Anyone

Any automated Test

Any Device

Anywhere

Anytime

42

第 43 页,共 277 页

Target Users - Usages

  • Test on demand and automated release testing
  • Failed test cases Re-run or failure reproduce
  • Automated pre-commit and post-commit testing
  • Test on demand, developer’s own build
  • Work with other dev tool, e.g. dichotomy check

Continuous Integration

Testing

QA

Release Testing

Developer

Testing

43

第 44 页,共 277 页

Docker Based Geo Dispersed Test Farm

44

第 45 页,共 277 页

Test Distribution

Test Catalog

Capability:

Platform:

Location:

Campaign A

capability: pmeter

Run campaign A on XYZ platform in SH

Test Distributor

Test Campaign ← Capability → Test Bench

45

第 46 页,共 277 页

Technical Challenges – Anywhere, Any Device

  • DUT and Test Equipment controls

$ docker run … --device=/dev/bus/usb/001/004

--device=/dev/ttySerial0 …

  • DUT state transition management

46

第 47 页,共 277 页

Technical Challenges – Anyone, Any Automated Test

  • Hierarchal code maintain

  • Easily customized

  • All-in-one in delivery

  • Create once, run anywhere

Release and deliver test suites in the way of docker image.

47

第 48 页,共 277 页

Questions?��Contacts: �jerry.yu@intel.comguobing.chen@intel.com

48

第 49 页,共 277 页

OpenHTF

an open-source hardware testing framework

https://github.com/google/openhtf

第 50 页,共 277 页

Motivation for OpenHTF

Drastically reduce the amount of boilerplate code needed to:

exercise a piece of hardware

take measurements along the way

generate a record of the whole process

Make operator interactions simple but flexible.

Allow test engineers to focus on authoring actual test logic.

“Simplicity is requisite for reliability.” ~Edsger W. Dijkstra

第 51 页,共 277 页

Google:

A Software Company

...at least, it used to be!

第 52 页,共 277 页

Google:

Now With More Hardware!

第 53 页,共 277 页

Our Solution

A python library that provides a set of convenient abstractions for authoring hardware testing code.

第 54 页,共 277 页

Use Cases

Manufacturing Floor

Automated Lab

Benchtop

第 55 页,共 277 页

Core Abstractions

Test

Plug

Test Equipment &

Device Under Test

Output Callback:

JSON to disk,

upload via network, etc.

Phase

Output

Record

Measurement

第 56 页,共 277 页

Tests & Phases

第 57 页,共 277 页

Plugs

第 58 页,共 277 页

Web GUI

第 59 页,共 277 页

Q&A

第 60 页,共 277 页

Detecting loop inefficiencies automatically

(to appear in FSE 2016)

Monika Dhok (IISc Bangalore, India)*

Murali Krishna Ramanathan (IISc Bangalore, India)

第 61 页,共 277 页

Software efficiency is very important

Performance issues are hard to detect during testing �

These issues are found even in well tested commercial softwares�

Degrade application responsiveness and user experience

第 62 页,共 277 页

Performance bugs are critical

Implementation mistakes that cause inefficiency

Difficult to catch them during compiler optimizations

Fixing them can result in large speedups, thereby improving efficiency

第 63 页,共 277 页

Redundant traversal bugs

When program iterates over a data structure repeatedly without any intermediate modifications

Public class A{

1. Public boolean containsAny(Collection c1, Collection c2){

2. Iterator itr = c1.iterator();

3. while(itr.hasNext())

4. if(c2.contains(itr.next()))

5. Return true;

6. Return false;

}

}

Complexity : O(size(c1) x size(c2))

第 64 页,共 277 页

Performance tests are written by developers

第 65 页,共 277 页

Detecting redundant traversals

Toddler [ICSE 13]

第 66 页,共 277 页

Static analysis techniques alone are not effective

Challenges :

How to confirm the validity of the bug?�

How to expose the root cause?

Execution trace can be helpful�

How to detect that the performance bug is fixed?

第 67 页,共 277 页

Automated tests not effective for performance bugs

Toddler[ICSE 13]

第 68 页,共 277 页

Challenges involved in writing performance tests

Virtual call resolution � Generating tests for all possible resolutions of method � invocation is not scalable

Generating appropriate context� Realization of the defect can be dependent on certain � conditions that affect the reachability of the inefficient loop

Arrangement of elements � Problem can only occur when data structure has large � elements arranged in particular fashion

第 69 页,共 277 页

Glider

We propose a novel and scalable approach to automatically generate tests for exposing loop inefficiencies

第 70 页,共 277 页

Glider is available online

https://drona.csa.iisc.ernet.in/~sss/tools/glider

第 71 页,共 277 页

Performance bug caught by glider

第 72 页,共 277 页

Results

We have implemented our approach on SOOT bytecode framework�and evaluated it on number of libraries

Our approach detected 46 bugs across 7 java libraries including 34 �previously unknown bugs.

Tests generated using our approach significantly outperform the �randomly generated tests.

第 73 页,共 277 页

Questions?

第 74 页,共 277 页

NEED FOR SPEED

accelerate tests from 3 hours to 3 minutes

emo@komfo.com

第 75 页,共 277 页

3

hours

3

minutes

600 API tests

第 76 页,共 277 页

Before

After

The

3 Minute

Goal

第 77 页,共 277 页

It’s not about the numbers or techniques you’ll see.

It’s all about continuous improvement.

第 78 页,共 277 页

Dedicated

Environment

第 79 页,共 277 页

Execution Time in Minutes

180

123

New Environment

第 80 页,共 277 页

Empty Databases

第 81 页,共 277 页

The time needed to create data for one test:

And then the test starts

Call 12 API endpoints

Modify data in 11 tables

Takes about 1.2 seconds

第 82 页,共 277 页

180

123

Execution Time in Minutes

89

Empty Databases

第 83 页,共 277 页

Simulate

Dependencies

第 84 页,共 277 页

+Some More

STUB

STUB

STUB

STUB

STUB

STUB

STUB

Stub all external dependencies

Core API

第 85 页,共 277 页

Transparent

Fake SSL certs

Dynamic Responses

Local Storage

Return Binary Data

Regex URL match

Existing Tools (March 2016)

Stubby4J

WireMock

Wilma

soapUI

MockServer

mounteback

Hoverfly

Mirage

We created project Nagual,

open source soon.

第 86 页,共 277 页

180

123

89

Execution Time in Minutes

65

Stub Dependencies

第 87 页,共 277 页

Move to Containers

第 88 页,共 277 页

180

123

89

65

Execution Time in Minutes

104

Using Containers

第 89 页,共 277 页

Run Databases

in Memory

第 90 页,共 277 页

180

123

89

65

104

Execution Time in Minutes

61

Run Databases in Memory

第 91 页,共 277 页

Don’t Clean

Test Data

第 92 页,共 277 页

180

123

89

65

104

61

Execution Time in Minutes

46

Don’t delete test data

第 93 页,共 277 页

Run in Parallel

第 94 页,共 277 页

4 6 8 10 12 14 16

Time to execute 12 9 7 5 8 12 17

The Sweet Spot

第 95 页,共 277 页

180

123

89

65

104

61

46

Execution Time in Minutes

5

Run in Parallel

第 96 页,共 277 页

Equalize Workload

第 97 页,共 277 页

第 98 页,共 277 页

第 99 页,共 277 页

180

123

89

65

104

61

46

5

Execution Time in Minutes

3

Equal Batches

Run in Parallel

Don’t delete test data

Run Databases in Memory

Using Containers

Stub Dependencies

Empty Databases

New Environment

第 100 页,共 277 页

After Hardware Upgrade

The Outcome

2:15 min.

1:38 min.

第 101 页,共 277 页

The tests are slow

The tests are unreliable

The tests can’t exactly pinpoint the problem

High Level Tests Problems

3 Minutes

No external dependencies

It’s cheap to run all tests after every change

第 102 页,共 277 页

In a couple of years, running all your automated tests, after every code change, for less than 3 minutes, will be standard development practice.

第 103 页,共 277 页

Recommended Reading

第 104 页,共 277 页

EmanuilSlavov.com

@EmanuilSlavov

第 105 页,共 277 页

Slide #, Photo Credits

1. https://www.flickr.com/photos/thomashawk

5. https://www.flickr.com/photos/100497095@N02

7. https://www.flickr.com/photos/andrewmalone

10. https://www.flickr.com/photos/astrablog

14. https://www.flickr.com/photos/foilman

16. https://www.flickr.com/photos/missusdoubleyou

18. https://www.flickr.com/photos/canonsnapper

20. https://www.flickr.com/photos/anotherangle

23. https://www.flickr.com/photos/-aismist

第 106 页,共 277 页

Code Coverage is a Strong Predictor of

Test Suite Effectiveness

in the Real World

Rahul Gopinath

Iftekhar Ahmed

第 107 页,共 277 页

When should we stop testing?

第 108 页,共 277 页

How to evaluate test suite effectiveness?

第 109 页,共 277 页

Previous research: Do not trust coverage

(In theory)

GTAC’15 Inozemtseva

第 110 页,共 277 页

Factors affecting test suite quality

Test suite quality

Coverage

Assertions

第 111 页,共 277 页

According to previous research

Test suite quality

Coverage

Assertions

Test suite size

GTAC’15 Inozemtseva

第 112 页,共 277 页

But...

What is the adequate test suite size?

  • Is there a maximum number of test cases for a given program?
  • Are different test cases equivalent in strength?
  • How do we account for duplicate tests?
  • Test suite sizes are not comparable even for the same program.

第 113 页,共 277 页

Can I use coverage to measure

suite effectiveness?

第 114 页,共 277 页

Statement coverage best predicts mutation score

A fault in a statement has 87% probability of being detected

if an organic test covers it.

M = 0.87xS

Size of dots follow size of projects

R2 = 0.94

Results from 250 real world programs

largest > 100 KLOC

On Developer written test suites

第 115 页,共 277 页

Statement coverage best predicts mutation score

A fault in a statement has 61% probability of being detected

if a generated test covers it.

M = 0.61xS

Size of dots follow size of projects

R2 = 0.70

Results from 250 real world programs

largest > 100 KLOC

On Randoop generated test suites

第 116 页,共 277 页

But

Controlling for test suite size, coverage provides little extra information.

Hence don't use coverage [GTAC’15 inozemtseva]

Why use mutation?

Mutation score provides little extra information (<6%) compared to coverage.

第 117 页,共 277 页

Does coverage have no extra value?

GTAC’15 Inozemtseva

Our Research

# Programs

5

250

Selection of programs

Ad hoc

Systematic sample from Github

Tool used

CodeCover, PIT

Emma, Cobertura, CodeCover, PIT

Test suites

Random subsets of original

Organic & Randomly generated

(New results)

Removal of influence of size

Ad hoc

Statistical

Our study is much larger, systematic (not ad hoc), and follows the real world usage

Our Research (New results)

M~TestsuiteSize

12.84%

M~log(TSize)

51.26%

residuals(M~log(TSize))~S

75.25%

Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.

第 118 页,共 277 页

Is mutation analysis better than coverage analysis?

第 119 页,共 277 页

Mutation analysis: High cost of analysis

Δ=b2 – 4ac

d = b^2 + 4 * a * c;�d = b^2 * 4 * a * c;�d = b^2 / 4 * a * c;�d = b^2 ^ 4 * a * c;�d = b^2 % 4 * a * c;

d = b^2 << 4 * a * c;

d = b^2 >> 4 * a * c;

d = b^2 * 4 + a * c;�d = b^2 * 4 - a * c;�d = b^2 * 4 / a * c;�d = b^2 * 4 ^ a * c;�d = b^2 * 4 % a * c;

d = b^2 * 4 << a * c;

d = b^2 * 4 >> a * c;

d = b^2 * 4 * a + c;�d = b^2 * 4 * a - c;�d = b^2 * 4 * a / c;�d = b^2 * 4 * a ^ c;�d = b^2 * 4 * a % c;

d = b^2 * 4 * a << c;

d = b^2 * 4 * a >> c;

d = b + 2 - 4 * a * c;�d = b - 2 - 4 * a * c;�d = b * 2 - 4 * a * c;�d = b / 2 - 4 * a * c;�d = b % 2 - 4 * a * c;

d = b^0 - 4 * a * c;�d = b^1 - 4 * a * c;

d = b^-1 - 4 * a * c;

d = b^MAX - 4 * a * c;

d = b^MIN - 4 * a * c;

d = b^2 - 0 * a * c;�d = b^2 - 1 * a * c;�d = b^2 – (-1) * a * c;�d = b^2 - MAX * a * c;�d = b^2 - MIN * a * c;�

第 120 页,共 277 页

Mutation score is very costly

第 121 页,共 277 页

Mutation analysis: Equivalent mutants

Δ=b2 – 22ac

d = b^2 - (2^2) * a * c;�d = b^2 - (2*2) * a * c;�d = b^2 - (2+2) * a * c;

Mutants

Original

Equivalent Mutant

Normal Mutant

Or: Do not trust low mutation scores

第 122 页,共 277 页

Low mutation score does not indicate a low quality test suite.

第 123 页,共 277 页

Mutation analysis: Equivalent mutants

Δ=b2 – 22ac

d = b^2 - (-4) * a * c;�d = b^2 + 4 * a * c;�d = (-b)^2 - 4 * a * c;

Mutants

Original

Equivalent Mutant

Redundant Mutant

Or: Do not trust low mutation scores

第 124 页,共 277 页

High mutation score does not indicate a high quality test suite.

第 125 页,共 277 页

Mutation Analysis: Different Operators

Δ=b2 – 4ac

d = b^2 + 4 * a * c;

>>> dis.dis(d)

2 0 LOAD_FAST 0 (b)

3 LOAD_CONST 1 (2)

6 LOAD_CONST 2 (4)

9 LOAD_FAST 1 (a)

12 BINARY_MULTIPLY

13 LOAD_FAST 2 (c)

16 BINARY_MULTIPLY

17 BINARY_SUBTRACT

18 BINARY_XOR

19 RETURN_VALUE x

[2016 Software Quality Journal]

第 126 页,共 277 页

Mutation score is not a consistent measure

第 127 页,共 277 页

Does a high coverage test suite

actually prevent bugs?

第 128 页,共 277 页

We looked at bugfixes on actual programs

An uncovered line is twice as likely to have a bug fix

as that of a line covered by any test case.

[FSE 2016]

Covered

Uncovered

p

Statement

0.68

1.20

0.00

Block

0.42

0.83

0.00

Method

0.40

0.87

0.00

Class

0.45

0.32

0.10

Difference in bug-fixes between covered and

Uncovered program elements

第 129 页,共 277 页

Does a high coverage test suite

actually prevent bugs?

Yes it does

第 130 页,共 277 页

Summary

Do not dismiss coverage lightly

Beware of mutation analysis caveats

Coverage is a pretty good heuristic on where the bugs hide.

  • Coverage is highly correlated with mutation score (92%)
  • Coverage provides 75% more information than just test suite size.

  • Mutation score provides little extra information compared to coverage.
  • Mutation score can be unreliable.

第 131 页,共 277 页

Assume non-equivalent, non-redundant, uniform fault distribution for mutants

at one’s own peril.

Beware of theoretical spherical cows…

第 132 页,共 277 页

Backup slides

第 133 页,共 277 页

That is,

  • Coverage is highly correlated with mutation score (92%)
  • Mutation score provides little extra information compared to coverage.
  • Coverage provides 75% more information than just test suite size.
  • Mutation score can be unreliable.
  • Coverage thresholds actually help reduce incidence of bugs.

第 134 页,共 277 页

Mutation X Path Coverage

第 135 页,共 277 页

Mutation X Branch Coverage

第 136 页,共 277 页

Computations

require(Coverage)

data(o.db)

o <- subset(subset(o.db, tloc != 0), select=c('pit.mutation.cov', 'cobertura.line.cov', 'loc', 'tloc'))

o$l.tloc <- log2(o$tloc)

oo <- subset(o, l.tloc != -Inf)

ooo <- na.omit(oo)

> cor.test(pit.mutation.cov,tloc)

t = 1.973, df = 232, p-value = 0.04969

95 percent confidence interval: 0.0002148688 0.2525430013

sample estimates: cor 0.1284574

> cor.test(pit.mutation.cov,l.tloc)

data: pit.mutation.cov and l.tloc

t = 9.0938, df = 232, p-value < 2.2e-16

95 percent confidence interval: 0.4114269 0.6013377

sample estimates: cor 0.5126249

> cor.test(resid(lm(pit.mutation.cov~log(tloc))),cobertura.line.cov)

data: resid(lm(pit.mutation.cov ~ log(tloc))) and cobertura.line.cov

t = 17.406, df = 232, p-value < 2.2e-16

95 percent confidence interval: 0.6909857 0.8032663

sample estimates: cor 0.7525441

> summary(lm(pit.mutation.cov~log(tloc)))

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.13644 0.06031 -2.262 0.0246 *

log(tloc) 0.09950 0.01094 9.094 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2839 on 232 degrees of freedom

Multiple R-squared: 0.2628, Adjusted R-squared: 0.2596

F-statistic: 82.7 on 1 and 232 DF, p-value: < 2.2e-16

> summary(lm(pit.mutation.cov~log(tloc)+cobertura.line.cov))

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.074859 0.031645 -2.366 0.018828 *

log(tloc) 0.023658 0.006487 3.647 0.000328 ***

cobertura.line.cov 0.785488 0.031628 24.836 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1485 on 231 degrees of freedom

Multiple R-squared: 0.7991, Adjusted R-squared: 0.7974

F-statistic: 459.5 on 2 and 231 DF, p-value: < 2.2e-16

第 137 页,共 277 页

Does Mutation score correlate to fixed bugs?

第 138 页,共 277 页

Mutant semiotics (how faults map to failures) is not well understood

Affected by factors of the particular project

  • Style of development, coding guidelines etc
  • Complexity of algorithms
  • Coupling between modules

第 139 页,共 277 页

Can weak mutation analysis help?

Rather than the failure of a test case for a mutant, we only require a change in state. It is easier to compute, but:

  • Does not verify assertions
  • So, Just another coverage technique
  • Redundant and Equivalent mutants remain

第 140 页,共 277 页

Method

250 real world projects from Github, largest > 100 KLOC.

Tests

Developer written

Randoop generated

Statement

Branch

Path

Mutation

Emma

X

Cobertura

X

X

Codecover

X

X

JMockit

X

X

PIT

X

X

Major

X

Judy

X

第 141 页,共 277 页

Mutation analysis has a number of other problems

  • Mutants are not similar in their difficulty to kill
    • So a test suite that is optimized for killing difficult mutants is at a disadvantage
  • Coupling effect has not been validated for complex systems
    • According to Wah, the coupling will decrease as the system gets larger.

第 142 页,共 277 页

The fault distribution may not be uniform

A majority of mutants are very easy to kill, but some are stubborn.

Does two test suites with say 50% mutation score have the same strength?

Testsuites optimized for harder to detect faults are penalized.

第 143 页,共 277 页

Correlation does not imply causation?

It was pointed out in the previous talk that correlation between coverage and mutation score does not imply a causal relationship between the two. We can counter it by:

Logic

A test suite with zero coverage will not kill any mutants.

A test suite can only kill mutants on the lines it covers.

Statistically

Using additive noise models to identify cause and effect. (ongoing research)

第 144 页,共 277 页

ClusterRunner

Making fast test-feedback easy through horizontal scaling.

Joseph Harrington and Taejun Lee Productivity Engineering

第 145 页,共 277 页

What is ClusterRunner?

第 146 页,共 277 页

第 147 页,共 277 页

第 148 页,共 277 页

Functional

Tests

Integration Tests

Unit Tests

Manual

Tests

第 149 页,共 277 页

第 150 页,共 277 页

第 151 页,共 277 页

Develop

Test

Feature

Design

Release

第 152 页,共 277 页

Develop

Test

Feature

Design

Release

第 153 页,共 277 页

PHPUnit testsuite duration at Box

第 154 页,共 277 页

第 155 页,共 277 页

“A problem isn’t a problem if you can throw money at it.”

第 156 页,共 277 页

第 157 页,共 277 页

PHPUnit

Scala SBT

nosetests

QUnit

JUnit

第 158 页,共 277 页

Requirements

Easy to configure and use

Test technology agnostic

Fast test feedback

第 159 页,共 277 页

第 160 页,共 277 页

www.ClusterRunner.com

第 161 页,共 277 页

Our 30-hour testsuite

17

minutes

第 162 页,共 277 页

ClusterRunner in Action

  • Bring up a cluster
  • Set up your project
  • Execute a build
  • Look at the results

第 163 页,共 277 页

Bring up a Cluster

# On master.box.com�clusterrunner master � --port 43000

# On slave1.box.com, slave2.box.com�clusterrunner slave � --master-url master.box.com:43000

第 164 页,共 277 页

Bring up a Cluster

http://master.box.com:43000/v1/slave/

第 165 页,共 277 页

第 166 页,共 277 页

Set up Your Project

  • Create clusterrunner.yaml at the root of your project repo.
    • Commands to run
    • How to distribute

第 167 页,共 277 页

第 168 页,共 277 页

Set up Your Project

�> phpunit ./test/php/EarthTest.php

> phpunit ./test/php/WindTest.php

> phpunit ./test/php/FireTest.php

> phpunit ./test/php/WaterTest.php

> phpunit ./test/php/HeartTest.php

第 169 页,共 277 页

Execute a Build

Now we’re ready to build!

�clusterrunner build� --master-url master.box.com:43000

git � --url http://github.com/myproject� --job-name PHPUnit

第 170 页,共 277 页

第 171 页,共 277 页

View Build Results

http://master.box.com:43000/v1/build/1/

第 172 页,共 277 页

第 173 页,共 277 页

View Build Results

http://master.box.com:43000/v1/build/1/subjob/

第 174 页,共 277 页

第 175 页,共 277 页

View Build Results

http://master.box.com:43000/v1/build/1/result

第 176 页,共 277 页

第 177 页,共 277 页

第 178 页,共 277 页

第 179 页,共 277 页

第 180 页,共 277 页

What’s next for ClusterRunner

  • AWS integration with autoscaling
  • Docker support
  • Improvements to deployment mechanism
  • In-place upgrades
  • Web UI

第 181 页,共 277 页

clusterrunner.com

Get Involved!

第 182 页,共 277 页

productivity@box.com

Contact Us

第 183 页,共 277 页

Multi-device Testing

E2E test infra for mobile products of today and tomorrow

angli@google.com

adorokhine@google.com

第 184 页,共 277 页

Overview

E2E testing challenges

Introducing Mobly

Sample test

Controlling Android devices

Custom controller

Demo

第 185 页,共 277 页

E2E Testing

Unit Tests

Integration/Component Tests

E2E Tests

Testing Pyramid

Where magic dwells

第 186 页,共 277 页

E2E Testing is Important

Applications involving multiple devices

P2P data transfer, nearby discovery

Product under test is not a conventional device.

Internet-Of-Things, VR

Need to control and vary physical environment

RF: Wi-Fi router, attenuators

Lighting, physical position

Interact with other software/cloud services

iPerf server, cloud service backend, network components

第 187 页,共 277 页

E2E Testing is Hard!

Most test frameworks are for single-device app testing

Need to trigger complex actions on devices

Some may need system privilege

Need to synchronize steps between multiple devices

Logic may be centralized (hard to write) or decentralized (hard to trigger)

Need to drive a wide range of equipment

attenuator, call box, power meter, wireless AP etc

Need to communicate with cloud services

Need to collect debugging artifacts from many sources

第 188 页,共 277 页

Our Solution - Mobly

Lightweight Python framework (Py2/3 compatible)

Test logic runs on a host machine

Controls a collection of devices/equipment in a test bed

Bundled with controller library for essential equipment

Android device, power meter, etc

Flexible and pluggable

Custom controller module for your own toys

Open source and ready to go!

第 189 页,共 277 页

Mobly Architecture

Test Bed

Computer

Mobly

Test Script

Mobile

Device

Network Switch

Attenuator

Call Box

Cloud Service

Test Harness

Test bed allocation, device provisioning, and results aggregation

第 190 页,共 277 页

Sample Tests

Hello from the other side

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLo�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

第 191 页,共 277 页

Describe a Test Bed

{� 'testbed': [{� 'name': 'SimpleTestBed',� 'AndroidDevice': '*'� }],� 'logpath': '/tmp/mobly_logs'�}

第 192 页,共 277 页

Test Script - Hello!

from mobly import base_test�from mobly import test_runner��class HelloWorldTest(base_test.BaseTestClass):� def setup_class(self):� self.ads = self.register_controller(android_device)� self.dut1 = self.ads[0]�� def test_hello_world(self):� self.dut1.sl4a.makeToast('Hello!')

if __name__ == '__main__':� test_runner.main()

Invocation:�$ ./path/to/hello_world_test.py -c path/to/config.json

第 193 页,共 277 页

Beyond the Basics

Config:�{� 'testbed': [{� ...� }],� 'logpath': '/tmp/mobly_logs',� 'toast_text': 'Hey there!'�}

�Code:

self.user_params['toast_text'] # 'Hey there!'

第 194 页,共 277 页

Beyond the Basics

Device specific logger

self.caller.log.info("I did something.")�# <timestamp> [AndroidDevice|<serial>] I did something

In test bed config:�'AndroidDevice': [{'serial': 'xyz', 'label': 'caller'},� {'serial': 'abc', 'label': 'callee',� 'phone_number': '123456'}]

In code:�self.callee = android_device.get_device(self.ads, label='callee')�self.callee.phone_number # '123456'

Specific device info

第 195 页,共 277 页

Controlling Android Devices

adb/shell

UI

API Calls

Custom Java Logic

第 196 页,共 277 页

Controlling Android Devices

adb

ad.adb.shell('pm clear com.my.package')

UI automator

ad.uia = uiautomator.Device(serial=ad.serial)

ad.uia(text='Hello World!').wait.exists(timeout=1000)

Android API calls, including system/hidden APIs, via SL4A

ad.sl4a.wifiConnect({'SSID': 'GoogleGuest'})

Custom Java logic

ad.register_snippets('trigger', 'com.my.package.snippets')

ad.trigger.myImpeccableLogic(5)

第 197 页,共 277 页

System API Calls

> self.dut.sl4a.makeToast('Hello World!')

SL4A (Scripting Layer for Android) is an RPC service exposing API calls on Android

self.dut.api is the RPC client for SL4A.

Original version works on regular Android builds.

Fork in AOSP can make direct system privileged calls (system/hidden APIs).

第 198 页,共 277 页

Custom Snippets

SL4A is not sufficient

SL4A methods are mapped to Android APIs, but tests need more than just Android API calls.

Current AOSP SL4A requires system privilege

Custom snippets allows users to define custom method that does anything they want.

Custom snippets can be used with other useful libs like Espresso

第 199 页,共 277 页

Custom Snippets

package com.mypackage.testing.snippets.example;

public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Returns a string containing the given number.')� public String getFoo(Integer input) {� return 'foo ' + input;� }�� @Override� public void shutdown() {}�}

第 200 页,共 277 页

Custom Snippets

Add your snippet classes to AndroidManifest.xml for the androidTest apk

<meta-data� android:name='mobly-snippets'� android:value='com.my.app.test.MySnippet1,� com.my.app.test.MySnippet2' />

Compile it into an apk

apply plugin: 'com.android.application'� dependencies {� androidTestCompile 'com.google.android.mobly:snippetlib:0.0.1'� }

第 201 页,共 277 页

Custom Snippets

Install the apk on your device

Load and call it

ad.load_snippets(name='snippets',� package='com.mypackage.testing.snippets.example')�foo = ad.snippets.getFoo(2) # 'foo 2'

第 202 页,共 277 页

Espresso in Custom Snippets

import static android.support.test.espresso.Espresso.onView;�import static android.support.test.espresso.action.ViewActions.swipeUp;�import static android.support.test.espresso.matcher.ViewMatchers.withId;

public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Performs a swipe using espresso')� public void performSwipe() {� onView(withId(R.id.my_view_id)).perform(swipeUp());� }�}

第 203 页,共 277 页

Custom Controllers

Plug in your own toys

第 204 页,共 277 页

Loose Controller Interface

def create(configs):� '''Instantiate controller objects'''��def destroy(objects):� '''Destroy controller objects'''

def get_info(objects):� '''[optional] Get controller info for test summary'''

第 205 页,共 277 页

Using Custom Controllers

from my.project.testing.controllers import car��def setup_class(self):� self.cars = self.register_controller(car)��def test_something(self):� self.cars[0].drive()

第 206 页,共 277 页

Video Demo

  1. A test bed with two phones and one watch.
  2. Phone A gives the voice command to watch.
  3. Watch initiates a call to phone B.
  4. Phone B gets a ringing call notification.
  5. Phone A hangs up.

第 207 页,共 277 页

Video Demo

第 208 页,共 277 页

Coming Soon

iOS controller libs

Dependent on libimobiledevice

KIFTest, XCTest, XCUITest

Async events in snippets

Standard snippet and python utils for basic Android operations

Support non-Nexus Android devices

第 209 页,共 277 页

Thank You!

Questions?

第 210 页,共 277 页

Scale vs Value

Test Automation at the BBC

David Buckhurst & Jitesh Gosai

第 211 页,共 277 页

第 212 页,共 277 页

第 213 页,共 277 页

第 214 页,共 277 页

第 215 页,共 277 页

Lots of innovation

Chair hive

第 216 页,共 277 页

第 217 页,共 277 页

第 218 页,共 277 页

第 219 页,共 277 页

第 220 页,共 277 页

第 221 页,共 277 页

第 222 页,共 277 页

第 223 页,共 277 页

第 224 页,共 277 页

第 225 页,共 277 页

第 226 页,共 277 页

第 227 页,共 277 页

第 228 页,共 277 页

第 229 页,共 277 页

第 230 页,共 277 页

Live

Insights

&

Operational

Notifications

第 231 页,共 277 页

第 232 页,共 277 页

Scale vs Value

第 233 页,共 277 页

www.bbc.co.uk/opensource

@BBCOpenSource

@davidbuckhurst @JitGo

第 234 页,共 277 页

Finding bugs in

C/C++ libraries using

libFuzzer

Kostya Serebryany, GTAC 2016

第 235 页,共 277 页

Agenda

  • What is fuzzing
  • Why fuzz
  • What to fuzz
  • How to fuzz
    • … with libFuzzer
  • Demo (CVE-2016-5179)

第 236 页,共 277 页

What is Fuzzing

  • Somehow generate a test input�
  • Feed it to the code under test�
  • Repeat

第 237 页,共 277 页

Why fuzz

  • Bugs specific to C/C++ that require the sanitizers to catch:
    • Use-after-free, buffer overflows, Uses of uninitialized memory, Memory leaks
  • Arithmetic bugs:
    • Div-by-zero, Int/float overflows, bitwise shifts by invalid amount
  • Plain crashes:
    • NULL dereferences, Uncaught exceptions
  • Concurrency bugs:
    • Data races, Deadlocks
  • Resource usage bugs:
    • Memory exhaustion, hangs or infinite loops, infinite recursion (stack overflows)
  • Logical bugs:
    • Discrepancies between two implementations of the same protocol (example)
    • Assertion failures

第 238 页,共 277 页

What to fuzz

  • Anything that consumes untrusted or complicated inputs:
    • Parsers of any kind (xml, pdf, truetype, ...)
    • Media codecs (audio, video, raster & vector images, etc)
    • Network protocols, RPC libraries (gRPC)
    • Crypto (boringssl, openssl)
    • Compression (zip, gzip, bzip2, brotli, …)
    • Compilers and interpreters (PHP, Perl, Python, Go, Clang, …)
    • Regular expression matchers (PCRE, RE2, libc’s regcomp)
    • Text/UTF processing (icu)
    • Databases (SQLite)
    • Browsers, text editors/processors (Chrome, OpenOffice)
  • OS Kernels (Linux), drivers, supervisors and VMs
  • UI (Chrome UI)

第 239 页,共 277 页

How to fuzz

  • Generation-based fuzzing
    • Usually a target-specific grammar-based generator�
  • Mutation-based fuzzing
    • Acquire a corpus of test inputs
    • Apply random mutations to the inputs�
  • Guided mutation-based fuzzing
    • Execute mutations with coverage instrumentation
    • If new coverage is observed the mutation is permanently added to the corpus

第 240 页,共 277 页

Fuzz Target - a C/C++ function worth fuzzing

extern "C"

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {

if (DataSize >= 3 &&

Data[0]=='F' &&

Data[1]=='U' &&

Data[2]=='Z' &&

Data[3]=='Z')

DoMoreStuff(Data, DataSize);

return 0;

}

第 241 页,共 277 页

libFuzzer - an engine for guided in-process fuzzing

  • libFuzzer: a library; provides main()
  • Build your target code with extra compiler flags
  • Link your target with libFuzzer
  • Pass a directory with the initial test corpus and run

% clang++ -g my-code.cc libFuzzer.a -o my-fuzzer \

-fsanitize=address -fsanitize-coverage=trace-pc-guard

% ./my-fuzzer MY_TEST_CORPUS_DIR

第 242 页,共 277 页

CVE-2016-5179 (c-ares, asynchronous DNS requests)

extern "C"

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {

unsigned char *buf; int buflen;

std::string s(reinterpret_cast<const char *>(Data), DataSize);

ares_create_query(s.c_str(), ns_c_in, ns_t_a, 0x1234, 0, &buf,

&buflen, 0);

free(buf);

return 0;

}

第 243 页,共 277 页

第 244 页,共 277 页

present perfect => present continuous

  • “The project X has been fuzzed, hence it is somewhat secure”�
  • False:
    • Bug discovery techniques evolve
    • The project X evolves
    • Fuzzing is CPU intensive and needs time to find bugs�
  • “The project X is being continuously fuzzed, the code coverage is monitored.”
    • Much better!

第 245 页,共 277 页

Oss-fuzz - fuzzing as a service for OSS

Based on ClusterFuzz, the fuzzing backend used for fuzzing Chrome components

Supported engines: libFuzzer, AFL, Radamsa, ...

https://github.com/google/oss-fuzz

第 246 页,共 277 页

Q&A

第 247 页,共 277 页

Can MongoDB Recover from Catastrophe?

How I learned to crash a server

{ name : "Jonathan Abrahams",

title : "Senior Quality Engineer",

location : "New York, NY",

twitter : "@MongoDB",

facebook : "MongoDB" }

第 248 页,共 277 页

A machine may crash for a variety of reasons:

  • Termination of virtual machine or host
  • Hardware failure
  • OS failure

Machine crash

Unexpected termination of mongod

Application crash

第 249 页,共 277 页

Why do we need to crash a machine?

We could abort mongod, but this would not fully simulate an unexpected crash of a machine or OS (kernel):

Immediate loss of power may prevent cached I/O from being flushed to disk.

A kernel panic can leave an application (and its data) in an unrecoverable state.

第 250 页,共 277 页

System passes h/w & s/w checks

mongod goes into recovery mode

mongod ready for client connection

System restart

第 251 页,共 277 页

How can we crash a machine?

We started by crashing the machine manually, by pulling the cord.

We evolved to using an appliance timer, which would power the machine off/on every 15 minutes.

We also figured out that setting up a cron job to send an internal crash command (more on this later) to the machine for a random period would do the job.

And then we realized, we need to do it a bit more often.

第 252 页,共 277 页

How did we really crash that machine, and can we do over and over and over and over...?

第 253 页,共 277 页

Why do we need to do it over and over and over?

A crash of a machine may be catastrophic. In order to uncover any subtle recovery bugs, we want to repeatedly crash a machine and test if it has recovered. A failure may only be encountered 1 out of 100 times!

第 254 页,共 277 页

Ubiquiti mPower PRO to the rescue!

Programmable power device, with ssh access from LAN via WiFi or Ethernet.

第 255 页,共 277 页

How do we turn off and on the power?

ssh admin@mpower

local outlet="output1"

# Send power cycle to mFi mPower to specified outlet

echo 0 > /dev/$outlet

sleep 10

echo 1 > /dev/$outlet

第 256 页,共 277 页

Physical vs. Virtual

It is necessary to test both type of machines as machine crashes are different and the underlying host OS and hardware may provide different I/O caching and data protection. Virtual machines typically rely on shared resources and physical machines typically use dedicated resources.

第 257 页,共 277 页

How do we crash a virtual machine?

We can crash it from the VM host:

KVM (Kernel-based VM): virsh destroy <vm>

VmWare: vmrun stop <vm> hard

第 258 页,共 277 页

How do we restart a crashed VM?

We can restart it from the VM host:

KVM (Kernel-based VM): virsh start <vm>

VmWare: vmrun start <vm>

第 259 页,共 277 页

How else can we crash a machine?

We can crash it using the magical SysRq key sequence (Linux only):

echo 1 | sudo tee /proc/sys/kernel/sysrq

echo b | sudo tee /proc/sysrq-trigger

第 260 页,共 277 页

How do we get the machine to restart?

Enable the BIOS setting to boot up after AC power is provided.

第 261 页,共 277 页

Restarting a Windows Machine

To disable a Windows machine from prompting you after unexpected shutdown:

bcdedit /set {default} bootstatuspolicy ignoreallfailures

bcdedit /set {current} bootstatuspolicy ignoreallfailures

bcdedit /timeout 5

第 262 页,共 277 页

The machine is running

Now that we figured out how to get our machine to crash and restart, we restart the mongod and it will go into recovery mode.

第 263 页,共 277 页

Recovery mode of mongod

Performed automatically when mongod starts, if there was an unclean shutdown detected.

WiredTiger starts from the last stable copy of the data on disk from the last checkpoint. The journal log is then applied and a new checkpoint is applied.

第 264 页,共 277 页

Before the crash!

Stimulate mongod by running several simultaneous (mongo shell) clients which provide a moderate load utilizing nearly all supported operations. This is important, as CRUD operations will cause mongod to perform I/O operations, which should never lead to file or data corruption.

第 265 页,共 277 页

Options, options

Client operations optionally provide:

Checkpoint document

Write & Read concerns

The mongod process is tested in a variety modes, including:

Standalone or single node replica set

Storage engine, i.e., mmapv1, wiredTiger

第 266 页,共 277 页

What do we do after mongod has restarted?

After the machine has been restarted, we start mongod on a private port and it goes into recovery mode. Once that completes, we perform further client validation, via mongo (shell):

serverStatus

Optionally, run validate against all databases and collections

Optionally, verify if a checkpoint document exists

Failure to recover, connect to mongod, or perform the other validation steps is considered a test failure.

第 267 页,共 277 页

What do we do after mongod has restarted?

Now that the recovery validation have passed, we will proceed with the pre-crash steps:

Stop and restart mongod on a public port

Start new set of (mongo shell) clients to perform various DB operations

第 268 页,共 277 页

Why do we care about validation?

The validate command checks the structures within a namespace for correctness by scanning the collection’s data and indexes. The command returns information regarding the on-disk representation of the collection.

Failing validation indicates that something has been corrupted, most likely due to an incomplete I/O operation during the unexpected shutdown.

第 269 页,共 277 页

Failure analysis

Since our developers could be local (NYC) or worldwide (Boston, Sydney), we want a self-service application they can use to reproduce reported failures. A bash script has been developed which can execute on both local hardware and in the cloud (AWS).

We save any artifacts useful for our developers to be able to analyze the failure:

Backup data files before starting mongod

Backup data files after mongod completes recovery

mongod and mongo (shell) log files

第 270 页,共 277 页

The crash testing helped to:

Extend our testing to scenarios not previously covered

Provide local and remote teams with tools to reproduce and analyze failures

Improve robustness of the mongod storage layer

第 271 页,共 277 页

Results, results

Storage engine bugs were discovered from the power cycle testing and led to fixes/improvements.

We have plans to incorporate this testing into our continuous integration.

第 272 页,共 277 页

Some bugs discovered

SERVER-20295 Power cycle test - mongod fails to start with invalid object size in storage.bson

SERVER-19774 WT_NOTFOUND: item not found during DB recovery

SERVER-19692 Mongod failed to open connection, remained in hung state, when running WT with LSM

SERVER-18838 DB fails to recover creates and drops after system crash

SERVER-18379 DB fails to recover when specifying LSM, after system crash

SERVER-18316 Database with WT engine fails to recover after system crash

SERVER-16702 Mongod fails during journal replay with mmapv1 after power cycle

SERVER-16021 WT failed to start with "lsm-worker: Error in LSM worker thread 2: No such file or directory"

第 273 页,共 277 页

Open issues?

Can we crash Windows using an internal command (cue the laugh track…)?

第 274 页,共 277 页

Closing remarks

第 275 页,共 277 页

Organizing committee

Alan Myrvold

Amar Amte

Andrea Dawson

Ari Shamash

Carly Schaeffer

Dan Giovannelli

David Aristizabal

Diego Cavalcanti

Jaydeep Mehta

Joe Drummey

Josephine Chandra

Kathleen Li

Lena Wakayama

Lesley Katzen

Madison Garcia

Matt Lowrie

Matthew Halupka

Sonal Shah

Travis Ellett

Yvette Nameth

第 276 页,共 277 页

London 2017

第 277 页,共 277 页

GTAC 2017

testing.googleblog.com