1 من 277

Welcome Back!

2 من 277

Developer Experience, FTW!

Niranjan Tulpule

3 من 277

Software development is being democratized

4 من 277

Core computing platforms are more accessible than ever

PCs

Smartphones & Tablets

2014

1.5M

1983

0

5 من 277

Free developer tools

Open Source building blocks

Free developer education

We’re lowering barrier to becoming a developer

6 من 277

2015

2016

2017

2018

2019

2020

2014

2013

2012

2011

Total Number of Active Apps in the App Store

2010

It’s never been easier to write apps

5M

3M

1M

7 من 277

Valuation of these 9 companies as a country's GDP would be in the top 50

  • Uber: $66B
  • Snapchat: $40B
  • Whatsapp: $16B
  • Airbnb: $25B
  • Flipkart: $15B
  • Pinterest: $11B
  • Lyft: $5.5B
  • Ola Cabs: $5B
  • Gojek: $1.3B

Valuation of these companies as a country's GDP would be in the top 50

8 من 277

51%Stability Issues

41%Functionality Related

7%Speed

1%Other

Classification of 1-star reviews (Sampling of Play Store reviews, May 2016)

Writing high quality apps is still hard

9 من 277

2.5K+

100+

~700

100M

Compounded complexity

Device

Manufacturer

model

OS versions

Carriers

Permutations

10 من 277

Improving software quality & testability by investing in Developer Experience.

11 من 277

Develop

Release

Monitor

Firebase Test Lab

for Android

12 من 277

Test on your users’ devices

13 من 277

Use with your existing workflow

Ahmed to place product shot here

>

_

Android Studio

Command line

Jenkins

Jenkins logo by Charles Lowell and Frontside CC BY-SA 3.0 https://wiki.jenkins-ci.org/display/JENKINS/Logo

14 من 277

Robo crawls your app automatically

15 من 277

Create Espresso tests by just using your app

Ahmed to place product shot here

16 من 277

Millions of Tests, and counting!

After extensive evaluation of the market, we've found that Firebase Test Lab is the best product for writing and running Espresso tests directly from Android Studio, saving us tons of time and effort around automated testing.

- Timothy West, Jet

17 من 277

Actionable Results at your fingertips

Get actionable results at your fingertips

Develop

Release

Monitor

Firebase Test Lab

for Android

Play Pre-Launch Report

18 من 277

Pre-launch report

Pre-launch reports summarize issues found when testing your app on a wide range of devices

19 من 277

20 من 277

21 من 277

22 من 277

Apps using the Play Pre-Launch Report show ~20% fewer crashes!

~60% of the crashes seen on Pre-Launch Report are fixed before public rollout.

23 من 277

Actionable Results at your fingertips

Get actionable results at your fingertips

Develop

Release

Monitor

Firebase Test Lab

for Android

Play Pre-Launch Report

Firebase Crash Reporting

24 من 277

Firebase Crash Reporting

Get actionable insights and comprehensive analytics whenever your users experience crashes and other errors

25 من 277

  • Integrate Gradle/Pod

  • 0-1 init lines of code

  • Start capturing errors!

26 من 277

fatal error A

6K

7K

non-fatal error A

5K

6K

fatal error B

4K

4.8K

fatal error C

3K

3K

Clustering

27 من 277

28 من 277

Get the big picture with comprehensive metrics on app versions, OS levels and device models

29 من 277

Find the exact line where the error happens

30 من 277

Minimize the time and effort to

resolve issues with data about your users’ devices

31 من 277

Log custom events before an error happens

//On Android

FirebaseCrash.log("Activity created.");

//On iOS

FIRCrashLog(@"Button clicked.");

32 من 277

Provide more context with events leading up to an error

33 من 277

Understand the Impact of Crashes on the Bottom Line

Confidential + Proprietary

34 من 277

Fix the bug, then win them back with

a timely push notification

Confidential + Proprietary

35 من 277

Looking ahead

Machine learning

Compilers

Toolchains

36 من 277

The shift to mobile caught us by surprise...

PCs

Smartphones & Tablets

2014

1.5M

1983

0

37 من 277

Thank You

38 من 277

Docker Based Geo Dispersed Test Farm �- Test Infrastructure Practice in Intel Android Program

Chen Guobing, Yu Jerry

38

39 من 277

Agenda

  • Test Infrastructure Challenges
  • Test as a Service
  • Docker Based Test Farm
  • Test Distribution
  • Technical Challenges
  • Questions

39

40 من 277

Taxonomies

40

41 من 277

Test Infrastructure Challenges

  • Maximize the use of Development Vehicles (Engineering samples)
  • Maximize the use of automated test
  • Minimize the maintenance cost of the Test Infra, test benches and test assets

41

42 من 277

Test as a Service – What We Need

Anyone

Any automated Test

Any Device

Anywhere

Anytime

42

43 من 277

Target Users - Usages

  • Test on demand and automated release testing
  • Failed test cases Re-run or failure reproduce
  • Automated pre-commit and post-commit testing
  • Test on demand, developer’s own build
  • Work with other dev tool, e.g. dichotomy check

Continuous Integration

Testing

QA

Release Testing

Developer

Testing

43

44 من 277

Docker Based Geo Dispersed Test Farm

44

45 من 277

Test Distribution

Test Catalog

Capability:

Platform:

Location:

Campaign A

capability: pmeter

Run campaign A on XYZ platform in SH

Test Distributor

Test Campaign ← Capability → Test Bench

45

46 من 277

Technical Challenges – Anywhere, Any Device

  • DUT and Test Equipment controls

$ docker run … --device=/dev/bus/usb/001/004

--device=/dev/ttySerial0 …

  • DUT state transition management

46

47 من 277

Technical Challenges – Anyone, Any Automated Test

  • Hierarchal code maintain

  • Easily customized

  • All-in-one in delivery

  • Create once, run anywhere

Release and deliver test suites in the way of docker image.

47

48 من 277

Questions?��Contacts: �jerry.yu@intel.comguobing.chen@intel.com

48

49 من 277

OpenHTF

an open-source hardware testing framework

https://github.com/google/openhtf

50 من 277

Motivation for OpenHTF

Drastically reduce the amount of boilerplate code needed to:

exercise a piece of hardware

take measurements along the way

generate a record of the whole process

Make operator interactions simple but flexible.

Allow test engineers to focus on authoring actual test logic.

“Simplicity is requisite for reliability.” ~Edsger W. Dijkstra

51 من 277

Google:

A Software Company

...at least, it used to be!

52 من 277

Google:

Now With More Hardware!

53 من 277

Our Solution

A python library that provides a set of convenient abstractions for authoring hardware testing code.

54 من 277

Use Cases

Manufacturing Floor

Automated Lab

Benchtop

55 من 277

Core Abstractions

Test

Plug

Test Equipment &

Device Under Test

Output Callback:

JSON to disk,

upload via network, etc.

Phase

Output

Record

Measurement

56 من 277

Tests & Phases

57 من 277

Plugs

58 من 277

Web GUI

59 من 277

Q&A

60 من 277

Detecting loop inefficiencies automatically

(to appear in FSE 2016)

Monika Dhok (IISc Bangalore, India)*

Murali Krishna Ramanathan (IISc Bangalore, India)

61 من 277

Software efficiency is very important

Performance issues are hard to detect during testing �

These issues are found even in well tested commercial softwares�

Degrade application responsiveness and user experience

62 من 277

Performance bugs are critical

Implementation mistakes that cause inefficiency

Difficult to catch them during compiler optimizations

Fixing them can result in large speedups, thereby improving efficiency

63 من 277

Redundant traversal bugs

When program iterates over a data structure repeatedly without any intermediate modifications

Public class A{

1. Public boolean containsAny(Collection c1, Collection c2){

2. Iterator itr = c1.iterator();

3. while(itr.hasNext())

4. if(c2.contains(itr.next()))

5. Return true;

6. Return false;

}

}

Complexity : O(size(c1) x size(c2))

64 من 277

Performance tests are written by developers

65 من 277

Detecting redundant traversals

Toddler [ICSE 13]

66 من 277

Static analysis techniques alone are not effective

Challenges :

How to confirm the validity of the bug?�

How to expose the root cause?

Execution trace can be helpful�

How to detect that the performance bug is fixed?

67 من 277

Automated tests not effective for performance bugs

Toddler[ICSE 13]

68 من 277

Challenges involved in writing performance tests

Virtual call resolution � Generating tests for all possible resolutions of method � invocation is not scalable

Generating appropriate context� Realization of the defect can be dependent on certain � conditions that affect the reachability of the inefficient loop

Arrangement of elements � Problem can only occur when data structure has large � elements arranged in particular fashion

69 من 277

Glider

We propose a novel and scalable approach to automatically generate tests for exposing loop inefficiencies

70 من 277

Glider is available online

https://drona.csa.iisc.ernet.in/~sss/tools/glider

71 من 277

Performance bug caught by glider

72 من 277

Results

We have implemented our approach on SOOT bytecode framework�and evaluated it on number of libraries

Our approach detected 46 bugs across 7 java libraries including 34 �previously unknown bugs.

Tests generated using our approach significantly outperform the �randomly generated tests.

73 من 277

Questions?

74 من 277

NEED FOR SPEED

accelerate tests from 3 hours to 3 minutes

emo@komfo.com

75 من 277

3

hours

3

minutes

600 API tests

76 من 277

Before

After

The

3 Minute

Goal

77 من 277

It’s not about the numbers or techniques you’ll see.

It’s all about continuous improvement.

78 من 277

Dedicated

Environment

79 من 277

Execution Time in Minutes

180

123

New Environment

80 من 277

Empty Databases

81 من 277

The time needed to create data for one test:

And then the test starts

Call 12 API endpoints

Modify data in 11 tables

Takes about 1.2 seconds

82 من 277

180

123

Execution Time in Minutes

89

Empty Databases

83 من 277

Simulate

Dependencies

84 من 277

+Some More

STUB

STUB

STUB

STUB

STUB

STUB

STUB

Stub all external dependencies

Core API

85 من 277

Transparent

Fake SSL certs

Dynamic Responses

Local Storage

Return Binary Data

Regex URL match

Existing Tools (March 2016)

Stubby4J

WireMock

Wilma

soapUI

MockServer

mounteback

Hoverfly

Mirage

We created project Nagual,

open source soon.

86 من 277

180

123

89

Execution Time in Minutes

65

Stub Dependencies

87 من 277

Move to Containers

88 من 277

180

123

89

65

Execution Time in Minutes

104

Using Containers

89 من 277

Run Databases

in Memory

90 من 277

180

123

89

65

104

Execution Time in Minutes

61

Run Databases in Memory

91 من 277

Don’t Clean

Test Data

92 من 277

180

123

89

65

104

61

Execution Time in Minutes

46

Don’t delete test data

93 من 277

Run in Parallel

94 من 277

4 6 8 10 12 14 16

Time to execute 12 9 7 5 8 12 17

The Sweet Spot

95 من 277

180

123

89

65

104

61

46

Execution Time in Minutes

5

Run in Parallel

96 من 277

Equalize Workload

97 من 277

98 من 277

99 من 277

180

123

89

65

104

61

46

5

Execution Time in Minutes

3

Equal Batches

Run in Parallel

Don’t delete test data

Run Databases in Memory

Using Containers

Stub Dependencies

Empty Databases

New Environment

100 من 277

After Hardware Upgrade

The Outcome

2:15 min.

1:38 min.

101 من 277

The tests are slow

The tests are unreliable

The tests can’t exactly pinpoint the problem

High Level Tests Problems

3 Minutes

No external dependencies

It’s cheap to run all tests after every change

102 من 277

In a couple of years, running all your automated tests, after every code change, for less than 3 minutes, will be standard development practice.

103 من 277

Recommended Reading

104 من 277

EmanuilSlavov.com

@EmanuilSlavov

105 من 277

Slide #, Photo Credits

1. https://www.flickr.com/photos/thomashawk

5. https://www.flickr.com/photos/100497095@N02

7. https://www.flickr.com/photos/andrewmalone

10. https://www.flickr.com/photos/astrablog

14. https://www.flickr.com/photos/foilman

16. https://www.flickr.com/photos/missusdoubleyou

18. https://www.flickr.com/photos/canonsnapper

20. https://www.flickr.com/photos/anotherangle

23. https://www.flickr.com/photos/-aismist

106 من 277

Code Coverage is a Strong Predictor of

Test Suite Effectiveness

in the Real World

Rahul Gopinath

Iftekhar Ahmed

107 من 277

When should we stop testing?

108 من 277

How to evaluate test suite effectiveness?

109 من 277

Previous research: Do not trust coverage

(In theory)

GTAC’15 Inozemtseva

110 من 277

Factors affecting test suite quality

Test suite quality

Coverage

Assertions

111 من 277

According to previous research

Test suite quality

Coverage

Assertions

Test suite size

GTAC’15 Inozemtseva

112 من 277

But...

What is the adequate test suite size?

  • Is there a maximum number of test cases for a given program?
  • Are different test cases equivalent in strength?
  • How do we account for duplicate tests?
  • Test suite sizes are not comparable even for the same program.

113 من 277

Can I use coverage to measure

suite effectiveness?

114 من 277

Statement coverage best predicts mutation score

A fault in a statement has 87% probability of being detected

if an organic test covers it.

M = 0.87xS

Size of dots follow size of projects

R2 = 0.94

Results from 250 real world programs

largest > 100 KLOC

On Developer written test suites

115 من 277

Statement coverage best predicts mutation score

A fault in a statement has 61% probability of being detected

if a generated test covers it.

M = 0.61xS

Size of dots follow size of projects

R2 = 0.70

Results from 250 real world programs

largest > 100 KLOC

On Randoop generated test suites

116 من 277

But

Controlling for test suite size, coverage provides little extra information.

Hence don't use coverage [GTAC’15 inozemtseva]

Why use mutation?

Mutation score provides little extra information (<6%) compared to coverage.

117 من 277

Does coverage have no extra value?

GTAC’15 Inozemtseva

Our Research

# Programs

5

250

Selection of programs

Ad hoc

Systematic sample from Github

Tool used

CodeCover, PIT

Emma, Cobertura, CodeCover, PIT

Test suites

Random subsets of original

Organic & Randomly generated

(New results)

Removal of influence of size

Ad hoc

Statistical

Our study is much larger, systematic (not ad hoc), and follows the real world usage

Our Research (New results)

M~TestsuiteSize

12.84%

M~log(TSize)

51.26%

residuals(M~log(TSize))~S

75.25%

Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.

118 من 277

Is mutation analysis better than coverage analysis?

119 من 277

Mutation analysis: High cost of analysis

Δ=b2 – 4ac

d = b^2 + 4 * a * c;�d = b^2 * 4 * a * c;�d = b^2 / 4 * a * c;�d = b^2 ^ 4 * a * c;�d = b^2 % 4 * a * c;

d = b^2 << 4 * a * c;

d = b^2 >> 4 * a * c;

d = b^2 * 4 + a * c;�d = b^2 * 4 - a * c;�d = b^2 * 4 / a * c;�d = b^2 * 4 ^ a * c;�d = b^2 * 4 % a * c;

d = b^2 * 4 << a * c;

d = b^2 * 4 >> a * c;

d = b^2 * 4 * a + c;�d = b^2 * 4 * a - c;�d = b^2 * 4 * a / c;�d = b^2 * 4 * a ^ c;�d = b^2 * 4 * a % c;

d = b^2 * 4 * a << c;

d = b^2 * 4 * a >> c;

d = b + 2 - 4 * a * c;�d = b - 2 - 4 * a * c;�d = b * 2 - 4 * a * c;�d = b / 2 - 4 * a * c;�d = b % 2 - 4 * a * c;

d = b^0 - 4 * a * c;�d = b^1 - 4 * a * c;

d = b^-1 - 4 * a * c;

d = b^MAX - 4 * a * c;

d = b^MIN - 4 * a * c;

d = b^2 - 0 * a * c;�d = b^2 - 1 * a * c;�d = b^2 – (-1) * a * c;�d = b^2 - MAX * a * c;�d = b^2 - MIN * a * c;�

120 من 277

Mutation score is very costly

121 من 277

Mutation analysis: Equivalent mutants

Δ=b2 – 22ac

d = b^2 - (2^2) * a * c;�d = b^2 - (2*2) * a * c;�d = b^2 - (2+2) * a * c;

Mutants

Original

Equivalent Mutant

Normal Mutant

Or: Do not trust low mutation scores

122 من 277

Low mutation score does not indicate a low quality test suite.

123 من 277

Mutation analysis: Equivalent mutants

Δ=b2 – 22ac

d = b^2 - (-4) * a * c;�d = b^2 + 4 * a * c;�d = (-b)^2 - 4 * a * c;

Mutants

Original

Equivalent Mutant

Redundant Mutant

Or: Do not trust low mutation scores

124 من 277

High mutation score does not indicate a high quality test suite.

125 من 277

Mutation Analysis: Different Operators

Δ=b2 – 4ac

d = b^2 + 4 * a * c;

>>> dis.dis(d)

2 0 LOAD_FAST 0 (b)

3 LOAD_CONST 1 (2)

6 LOAD_CONST 2 (4)

9 LOAD_FAST 1 (a)

12 BINARY_MULTIPLY

13 LOAD_FAST 2 (c)

16 BINARY_MULTIPLY

17 BINARY_SUBTRACT

18 BINARY_XOR

19 RETURN_VALUE x

[2016 Software Quality Journal]

126 من 277

Mutation score is not a consistent measure

127 من 277

Does a high coverage test suite

actually prevent bugs?

128 من 277

We looked at bugfixes on actual programs

An uncovered line is twice as likely to have a bug fix

as that of a line covered by any test case.

[FSE 2016]

Covered

Uncovered

p

Statement

0.68

1.20

0.00

Block

0.42

0.83

0.00

Method

0.40

0.87

0.00

Class

0.45

0.32

0.10

Difference in bug-fixes between covered and

Uncovered program elements

129 من 277

Does a high coverage test suite

actually prevent bugs?

Yes it does

130 من 277

Summary

Do not dismiss coverage lightly

Beware of mutation analysis caveats

Coverage is a pretty good heuristic on where the bugs hide.

  • Coverage is highly correlated with mutation score (92%)
  • Coverage provides 75% more information than just test suite size.

  • Mutation score provides little extra information compared to coverage.
  • Mutation score can be unreliable.

131 من 277

Assume non-equivalent, non-redundant, uniform fault distribution for mutants

at one’s own peril.

Beware of theoretical spherical cows…

132 من 277

Backup slides

133 من 277

That is,

  • Coverage is highly correlated with mutation score (92%)
  • Mutation score provides little extra information compared to coverage.
  • Coverage provides 75% more information than just test suite size.
  • Mutation score can be unreliable.
  • Coverage thresholds actually help reduce incidence of bugs.

134 من 277

Mutation X Path Coverage

135 من 277

Mutation X Branch Coverage

136 من 277

Computations

require(Coverage)

data(o.db)

o <- subset(subset(o.db, tloc != 0), select=c('pit.mutation.cov', 'cobertura.line.cov', 'loc', 'tloc'))

o$l.tloc <- log2(o$tloc)

oo <- subset(o, l.tloc != -Inf)

ooo <- na.omit(oo)

> cor.test(pit.mutation.cov,tloc)

t = 1.973, df = 232, p-value = 0.04969

95 percent confidence interval: 0.0002148688 0.2525430013

sample estimates: cor 0.1284574

> cor.test(pit.mutation.cov,l.tloc)

data: pit.mutation.cov and l.tloc

t = 9.0938, df = 232, p-value < 2.2e-16

95 percent confidence interval: 0.4114269 0.6013377

sample estimates: cor 0.5126249

> cor.test(resid(lm(pit.mutation.cov~log(tloc))),cobertura.line.cov)

data: resid(lm(pit.mutation.cov ~ log(tloc))) and cobertura.line.cov

t = 17.406, df = 232, p-value < 2.2e-16

95 percent confidence interval: 0.6909857 0.8032663

sample estimates: cor 0.7525441

> summary(lm(pit.mutation.cov~log(tloc)))

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.13644 0.06031 -2.262 0.0246 *

log(tloc) 0.09950 0.01094 9.094 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2839 on 232 degrees of freedom

Multiple R-squared: 0.2628, Adjusted R-squared: 0.2596

F-statistic: 82.7 on 1 and 232 DF, p-value: < 2.2e-16

> summary(lm(pit.mutation.cov~log(tloc)+cobertura.line.cov))

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.074859 0.031645 -2.366 0.018828 *

log(tloc) 0.023658 0.006487 3.647 0.000328 ***

cobertura.line.cov 0.785488 0.031628 24.836 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1485 on 231 degrees of freedom

Multiple R-squared: 0.7991, Adjusted R-squared: 0.7974

F-statistic: 459.5 on 2 and 231 DF, p-value: < 2.2e-16

137 من 277

Does Mutation score correlate to fixed bugs?

138 من 277

Mutant semiotics (how faults map to failures) is not well understood

Affected by factors of the particular project

  • Style of development, coding guidelines etc
  • Complexity of algorithms
  • Coupling between modules

139 من 277

Can weak mutation analysis help?

Rather than the failure of a test case for a mutant, we only require a change in state. It is easier to compute, but:

  • Does not verify assertions
  • So, Just another coverage technique
  • Redundant and Equivalent mutants remain

140 من 277

Method

250 real world projects from Github, largest > 100 KLOC.

Tests

Developer written

Randoop generated

Statement

Branch

Path

Mutation

Emma

X

Cobertura

X

X

Codecover

X

X

JMockit

X

X

PIT

X

X

Major

X

Judy

X

141 من 277

Mutation analysis has a number of other problems

  • Mutants are not similar in their difficulty to kill
    • So a test suite that is optimized for killing difficult mutants is at a disadvantage
  • Coupling effect has not been validated for complex systems
    • According to Wah, the coupling will decrease as the system gets larger.

142 من 277

The fault distribution may not be uniform

A majority of mutants are very easy to kill, but some are stubborn.

Does two test suites with say 50% mutation score have the same strength?

Testsuites optimized for harder to detect faults are penalized.

143 من 277

Correlation does not imply causation?

It was pointed out in the previous talk that correlation between coverage and mutation score does not imply a causal relationship between the two. We can counter it by:

Logic

A test suite with zero coverage will not kill any mutants.

A test suite can only kill mutants on the lines it covers.

Statistically

Using additive noise models to identify cause and effect. (ongoing research)

144 من 277

ClusterRunner

Making fast test-feedback easy through horizontal scaling.

Joseph Harrington and Taejun Lee Productivity Engineering

145 من 277

What is ClusterRunner?

146 من 277

147 من 277

148 من 277

Functional

Tests

Integration Tests

Unit Tests

Manual

Tests

149 من 277

150 من 277

151 من 277

Develop

Test

Feature

Design

Release

152 من 277

Develop

Test

Feature

Design

Release

153 من 277

PHPUnit testsuite duration at Box

154 من 277

155 من 277

“A problem isn’t a problem if you can throw money at it.”

156 من 277

157 من 277

PHPUnit

Scala SBT

nosetests

QUnit

JUnit

158 من 277

Requirements

Easy to configure and use

Test technology agnostic

Fast test feedback

159 من 277

160 من 277

www.ClusterRunner.com

161 من 277

Our 30-hour testsuite

17

minutes

162 من 277

ClusterRunner in Action

  • Bring up a cluster
  • Set up your project
  • Execute a build
  • Look at the results

163 من 277

Bring up a Cluster

# On master.box.com�clusterrunner master � --port 43000

# On slave1.box.com, slave2.box.com�clusterrunner slave � --master-url master.box.com:43000

164 من 277

Bring up a Cluster

http://master.box.com:43000/v1/slave/

165 من 277

166 من 277

Set up Your Project

  • Create clusterrunner.yaml at the root of your project repo.
    • Commands to run
    • How to distribute

167 من 277

168 من 277

Set up Your Project

�> phpunit ./test/php/EarthTest.php

> phpunit ./test/php/WindTest.php

> phpunit ./test/php/FireTest.php

> phpunit ./test/php/WaterTest.php

> phpunit ./test/php/HeartTest.php

169 من 277

Execute a Build

Now we’re ready to build!

�clusterrunner build� --master-url master.box.com:43000

git � --url http://github.com/myproject� --job-name PHPUnit

170 من 277

171 من 277

View Build Results

http://master.box.com:43000/v1/build/1/

172 من 277

173 من 277

View Build Results

http://master.box.com:43000/v1/build/1/subjob/

174 من 277

175 من 277

View Build Results

http://master.box.com:43000/v1/build/1/result

176 من 277

177 من 277

178 من 277

179 من 277

180 من 277

What’s next for ClusterRunner

  • AWS integration with autoscaling
  • Docker support
  • Improvements to deployment mechanism
  • In-place upgrades
  • Web UI

181 من 277

clusterrunner.com

Get Involved!

182 من 277

productivity@box.com

Contact Us

183 من 277

Multi-device Testing

E2E test infra for mobile products of today and tomorrow

angli@google.com

adorokhine@google.com

184 من 277

Overview

E2E testing challenges

Introducing Mobly

Sample test

Controlling Android devices

Custom controller

Demo

185 من 277

E2E Testing

Unit Tests

Integration/Component Tests

E2E Tests

Testing Pyramid

Where magic dwells

186 من 277

E2E Testing is Important

Applications involving multiple devices

P2P data transfer, nearby discovery

Product under test is not a conventional device.

Internet-Of-Things, VR

Need to control and vary physical environment

RF: Wi-Fi router, attenuators

Lighting, physical position

Interact with other software/cloud services

iPerf server, cloud service backend, network components

187 من 277

E2E Testing is Hard!

Most test frameworks are for single-device app testing

Need to trigger complex actions on devices

Some may need system privilege

Need to synchronize steps between multiple devices

Logic may be centralized (hard to write) or decentralized (hard to trigger)

Need to drive a wide range of equipment

attenuator, call box, power meter, wireless AP etc

Need to communicate with cloud services

Need to collect debugging artifacts from many sources

188 من 277

Our Solution - Mobly

Lightweight Python framework (Py2/3 compatible)

Test logic runs on a host machine

Controls a collection of devices/equipment in a test bed

Bundled with controller library for essential equipment

Android device, power meter, etc

Flexible and pluggable

Custom controller module for your own toys

Open source and ready to go!

189 من 277

Mobly Architecture

Test Bed

Computer

Mobly

Test Script

Mobile

Device

Network Switch

Attenuator

Call Box

Cloud Service

Test Harness

Test bed allocation, device provisioning, and results aggregation

190 من 277

Sample Tests

Hello from the other side

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLo�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO

191 من 277

Describe a Test Bed

{� 'testbed': [{� 'name': 'SimpleTestBed',� 'AndroidDevice': '*'� }],� 'logpath': '/tmp/mobly_logs'�}

192 من 277

Test Script - Hello!

from mobly import base_test�from mobly import test_runner��class HelloWorldTest(base_test.BaseTestClass):� def setup_class(self):� self.ads = self.register_controller(android_device)� self.dut1 = self.ads[0]�� def test_hello_world(self):� self.dut1.sl4a.makeToast('Hello!')

if __name__ == '__main__':� test_runner.main()

Invocation:�$ ./path/to/hello_world_test.py -c path/to/config.json

193 من 277

Beyond the Basics

Config:�{� 'testbed': [{� ...� }],� 'logpath': '/tmp/mobly_logs',� 'toast_text': 'Hey there!'�}

�Code:

self.user_params['toast_text'] # 'Hey there!'

194 من 277

Beyond the Basics

Device specific logger

self.caller.log.info("I did something.")�# <timestamp> [AndroidDevice|<serial>] I did something

In test bed config:�'AndroidDevice': [{'serial': 'xyz', 'label': 'caller'},� {'serial': 'abc', 'label': 'callee',� 'phone_number': '123456'}]

In code:�self.callee = android_device.get_device(self.ads, label='callee')�self.callee.phone_number # '123456'

Specific device info

195 من 277

Controlling Android Devices

adb/shell

UI

API Calls

Custom Java Logic

196 من 277

Controlling Android Devices

adb

ad.adb.shell('pm clear com.my.package')

UI automator

ad.uia = uiautomator.Device(serial=ad.serial)

ad.uia(text='Hello World!').wait.exists(timeout=1000)

Android API calls, including system/hidden APIs, via SL4A

ad.sl4a.wifiConnect({'SSID': 'GoogleGuest'})

Custom Java logic

ad.register_snippets('trigger', 'com.my.package.snippets')

ad.trigger.myImpeccableLogic(5)

197 من 277

System API Calls

> self.dut.sl4a.makeToast('Hello World!')

SL4A (Scripting Layer for Android) is an RPC service exposing API calls on Android

self.dut.api is the RPC client for SL4A.

Original version works on regular Android builds.

Fork in AOSP can make direct system privileged calls (system/hidden APIs).

198 من 277

Custom Snippets

SL4A is not sufficient

SL4A methods are mapped to Android APIs, but tests need more than just Android API calls.

Current AOSP SL4A requires system privilege

Custom snippets allows users to define custom method that does anything they want.

Custom snippets can be used with other useful libs like Espresso

199 من 277

Custom Snippets

package com.mypackage.testing.snippets.example;

public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Returns a string containing the given number.')� public String getFoo(Integer input) {� return 'foo ' + input;� }�� @Override� public void shutdown() {}�}

200 من 277

Custom Snippets

Add your snippet classes to AndroidManifest.xml for the androidTest apk

<meta-data� android:name='mobly-snippets'� android:value='com.my.app.test.MySnippet1,� com.my.app.test.MySnippet2' />

Compile it into an apk

apply plugin: 'com.android.application'� dependencies {� androidTestCompile 'com.google.android.mobly:snippetlib:0.0.1'� }

201 من 277

Custom Snippets

Install the apk on your device

Load and call it

ad.load_snippets(name='snippets',� package='com.mypackage.testing.snippets.example')�foo = ad.snippets.getFoo(2) # 'foo 2'

202 من 277

Espresso in Custom Snippets

import static android.support.test.espresso.Espresso.onView;�import static android.support.test.espresso.action.ViewActions.swipeUp;�import static android.support.test.espresso.matcher.ViewMatchers.withId;

public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Performs a swipe using espresso')� public void performSwipe() {� onView(withId(R.id.my_view_id)).perform(swipeUp());� }�}

203 من 277

Custom Controllers

Plug in your own toys

204 من 277

Loose Controller Interface

def create(configs):� '''Instantiate controller objects'''��def destroy(objects):� '''Destroy controller objects'''

def get_info(objects):� '''[optional] Get controller info for test summary'''

205 من 277

Using Custom Controllers

from my.project.testing.controllers import car��def setup_class(self):� self.cars = self.register_controller(car)��def test_something(self):� self.cars[0].drive()

206 من 277

Video Demo

  1. A test bed with two phones and one watch.
  2. Phone A gives the voice command to watch.
  3. Watch initiates a call to phone B.
  4. Phone B gets a ringing call notification.
  5. Phone A hangs up.

207 من 277

Video Demo

208 من 277

Coming Soon

iOS controller libs

Dependent on libimobiledevice

KIFTest, XCTest, XCUITest

Async events in snippets

Standard snippet and python utils for basic Android operations

Support non-Nexus Android devices

209 من 277

Thank You!

Questions?

210 من 277

Scale vs Value

Test Automation at the BBC

David Buckhurst & Jitesh Gosai

211 من 277

212 من 277

213 من 277

214 من 277

215 من 277

Lots of innovation

Chair hive

216 من 277

217 من 277

218 من 277

219 من 277

220 من 277

221 من 277

222 من 277

223 من 277

224 من 277

225 من 277

226 من 277

227 من 277

228 من 277

229 من 277

230 من 277

Live

Insights

&

Operational

Notifications

231 من 277

232 من 277

Scale vs Value

233 من 277

www.bbc.co.uk/opensource

@BBCOpenSource

@davidbuckhurst @JitGo

234 من 277

Finding bugs in

C/C++ libraries using

libFuzzer

Kostya Serebryany, GTAC 2016

235 من 277

Agenda

  • What is fuzzing
  • Why fuzz
  • What to fuzz
  • How to fuzz
    • … with libFuzzer
  • Demo (CVE-2016-5179)

236 من 277

What is Fuzzing

  • Somehow generate a test input�
  • Feed it to the code under test�
  • Repeat

237 من 277

Why fuzz

  • Bugs specific to C/C++ that require the sanitizers to catch:
    • Use-after-free, buffer overflows, Uses of uninitialized memory, Memory leaks
  • Arithmetic bugs:
    • Div-by-zero, Int/float overflows, bitwise shifts by invalid amount
  • Plain crashes:
    • NULL dereferences, Uncaught exceptions
  • Concurrency bugs:
    • Data races, Deadlocks
  • Resource usage bugs:
    • Memory exhaustion, hangs or infinite loops, infinite recursion (stack overflows)
  • Logical bugs:
    • Discrepancies between two implementations of the same protocol (example)
    • Assertion failures

238 من 277

What to fuzz

  • Anything that consumes untrusted or complicated inputs:
    • Parsers of any kind (xml, pdf, truetype, ...)
    • Media codecs (audio, video, raster & vector images, etc)
    • Network protocols, RPC libraries (gRPC)
    • Crypto (boringssl, openssl)
    • Compression (zip, gzip, bzip2, brotli, …)
    • Compilers and interpreters (PHP, Perl, Python, Go, Clang, …)
    • Regular expression matchers (PCRE, RE2, libc’s regcomp)
    • Text/UTF processing (icu)
    • Databases (SQLite)
    • Browsers, text editors/processors (Chrome, OpenOffice)
  • OS Kernels (Linux), drivers, supervisors and VMs
  • UI (Chrome UI)

239 من 277

How to fuzz

  • Generation-based fuzzing
    • Usually a target-specific grammar-based generator�
  • Mutation-based fuzzing
    • Acquire a corpus of test inputs
    • Apply random mutations to the inputs�
  • Guided mutation-based fuzzing
    • Execute mutations with coverage instrumentation
    • If new coverage is observed the mutation is permanently added to the corpus

240 من 277

Fuzz Target - a C/C++ function worth fuzzing

extern "C"

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {

if (DataSize >= 3 &&

Data[0]=='F' &&

Data[1]=='U' &&

Data[2]=='Z' &&

Data[3]=='Z')

DoMoreStuff(Data, DataSize);

return 0;

}

241 من 277

libFuzzer - an engine for guided in-process fuzzing

  • libFuzzer: a library; provides main()
  • Build your target code with extra compiler flags
  • Link your target with libFuzzer
  • Pass a directory with the initial test corpus and run

% clang++ -g my-code.cc libFuzzer.a -o my-fuzzer \

-fsanitize=address -fsanitize-coverage=trace-pc-guard

% ./my-fuzzer MY_TEST_CORPUS_DIR

242 من 277

CVE-2016-5179 (c-ares, asynchronous DNS requests)

extern "C"

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {

unsigned char *buf; int buflen;

std::string s(reinterpret_cast<const char *>(Data), DataSize);

ares_create_query(s.c_str(), ns_c_in, ns_t_a, 0x1234, 0, &buf,

&buflen, 0);

free(buf);

return 0;

}

243 من 277

244 من 277

present perfect => present continuous

  • “The project X has been fuzzed, hence it is somewhat secure”�
  • False:
    • Bug discovery techniques evolve
    • The project X evolves
    • Fuzzing is CPU intensive and needs time to find bugs�
  • “The project X is being continuously fuzzed, the code coverage is monitored.”
    • Much better!

245 من 277

Oss-fuzz - fuzzing as a service for OSS

Based on ClusterFuzz, the fuzzing backend used for fuzzing Chrome components

Supported engines: libFuzzer, AFL, Radamsa, ...

https://github.com/google/oss-fuzz

246 من 277

Q&A

247 من 277

Can MongoDB Recover from Catastrophe?

How I learned to crash a server

{ name : "Jonathan Abrahams",

title : "Senior Quality Engineer",

location : "New York, NY",

twitter : "@MongoDB",

facebook : "MongoDB" }

248 من 277

A machine may crash for a variety of reasons:

  • Termination of virtual machine or host
  • Hardware failure
  • OS failure

Machine crash

Unexpected termination of mongod

Application crash

249 من 277

Why do we need to crash a machine?

We could abort mongod, but this would not fully simulate an unexpected crash of a machine or OS (kernel):

Immediate loss of power may prevent cached I/O from being flushed to disk.

A kernel panic can leave an application (and its data) in an unrecoverable state.

250 من 277

System passes h/w & s/w checks

mongod goes into recovery mode

mongod ready for client connection

System restart

251 من 277

How can we crash a machine?

We started by crashing the machine manually, by pulling the cord.

We evolved to using an appliance timer, which would power the machine off/on every 15 minutes.

We also figured out that setting up a cron job to send an internal crash command (more on this later) to the machine for a random period would do the job.

And then we realized, we need to do it a bit more often.

252 من 277

How did we really crash that machine, and can we do over and over and over and over...?

253 من 277

Why do we need to do it over and over and over?

A crash of a machine may be catastrophic. In order to uncover any subtle recovery bugs, we want to repeatedly crash a machine and test if it has recovered. A failure may only be encountered 1 out of 100 times!

254 من 277

Ubiquiti mPower PRO to the rescue!

Programmable power device, with ssh access from LAN via WiFi or Ethernet.

255 من 277

How do we turn off and on the power?

ssh admin@mpower

local outlet="output1"

# Send power cycle to mFi mPower to specified outlet

echo 0 > /dev/$outlet

sleep 10

echo 1 > /dev/$outlet

256 من 277

Physical vs. Virtual

It is necessary to test both type of machines as machine crashes are different and the underlying host OS and hardware may provide different I/O caching and data protection. Virtual machines typically rely on shared resources and physical machines typically use dedicated resources.

257 من 277

How do we crash a virtual machine?

We can crash it from the VM host:

KVM (Kernel-based VM): virsh destroy <vm>

VmWare: vmrun stop <vm> hard

258 من 277

How do we restart a crashed VM?

We can restart it from the VM host:

KVM (Kernel-based VM): virsh start <vm>

VmWare: vmrun start <vm>

259 من 277

How else can we crash a machine?

We can crash it using the magical SysRq key sequence (Linux only):

echo 1 | sudo tee /proc/sys/kernel/sysrq

echo b | sudo tee /proc/sysrq-trigger

260 من 277

How do we get the machine to restart?

Enable the BIOS setting to boot up after AC power is provided.

261 من 277

Restarting a Windows Machine

To disable a Windows machine from prompting you after unexpected shutdown:

bcdedit /set {default} bootstatuspolicy ignoreallfailures

bcdedit /set {current} bootstatuspolicy ignoreallfailures

bcdedit /timeout 5

262 من 277

The machine is running

Now that we figured out how to get our machine to crash and restart, we restart the mongod and it will go into recovery mode.

263 من 277

Recovery mode of mongod

Performed automatically when mongod starts, if there was an unclean shutdown detected.

WiredTiger starts from the last stable copy of the data on disk from the last checkpoint. The journal log is then applied and a new checkpoint is applied.

264 من 277

Before the crash!

Stimulate mongod by running several simultaneous (mongo shell) clients which provide a moderate load utilizing nearly all supported operations. This is important, as CRUD operations will cause mongod to perform I/O operations, which should never lead to file or data corruption.

265 من 277

Options, options

Client operations optionally provide:

Checkpoint document

Write & Read concerns

The mongod process is tested in a variety modes, including:

Standalone or single node replica set

Storage engine, i.e., mmapv1, wiredTiger

266 من 277

What do we do after mongod has restarted?

After the machine has been restarted, we start mongod on a private port and it goes into recovery mode. Once that completes, we perform further client validation, via mongo (shell):

serverStatus

Optionally, run validate against all databases and collections

Optionally, verify if a checkpoint document exists

Failure to recover, connect to mongod, or perform the other validation steps is considered a test failure.

267 من 277

What do we do after mongod has restarted?

Now that the recovery validation have passed, we will proceed with the pre-crash steps:

Stop and restart mongod on a public port

Start new set of (mongo shell) clients to perform various DB operations

268 من 277

Why do we care about validation?

The validate command checks the structures within a namespace for correctness by scanning the collection’s data and indexes. The command returns information regarding the on-disk representation of the collection.

Failing validation indicates that something has been corrupted, most likely due to an incomplete I/O operation during the unexpected shutdown.

269 من 277

Failure analysis

Since our developers could be local (NYC) or worldwide (Boston, Sydney), we want a self-service application they can use to reproduce reported failures. A bash script has been developed which can execute on both local hardware and in the cloud (AWS).

We save any artifacts useful for our developers to be able to analyze the failure:

Backup data files before starting mongod

Backup data files after mongod completes recovery

mongod and mongo (shell) log files

270 من 277

The crash testing helped to:

Extend our testing to scenarios not previously covered

Provide local and remote teams with tools to reproduce and analyze failures

Improve robustness of the mongod storage layer

271 من 277

Results, results

Storage engine bugs were discovered from the power cycle testing and led to fixes/improvements.

We have plans to incorporate this testing into our continuous integration.

272 من 277

Some bugs discovered

SERVER-20295 Power cycle test - mongod fails to start with invalid object size in storage.bson

SERVER-19774 WT_NOTFOUND: item not found during DB recovery

SERVER-19692 Mongod failed to open connection, remained in hung state, when running WT with LSM

SERVER-18838 DB fails to recover creates and drops after system crash

SERVER-18379 DB fails to recover when specifying LSM, after system crash

SERVER-18316 Database with WT engine fails to recover after system crash

SERVER-16702 Mongod fails during journal replay with mmapv1 after power cycle

SERVER-16021 WT failed to start with "lsm-worker: Error in LSM worker thread 2: No such file or directory"

273 من 277

Open issues?

Can we crash Windows using an internal command (cue the laugh track…)?

274 من 277

Closing remarks

275 من 277

Organizing committee

Alan Myrvold

Amar Amte

Andrea Dawson

Ari Shamash

Carly Schaeffer

Dan Giovannelli

David Aristizabal

Diego Cavalcanti

Jaydeep Mehta

Joe Drummey

Josephine Chandra

Kathleen Li

Lena Wakayama

Lesley Katzen

Madison Garcia

Matt Lowrie

Matthew Halupka

Sonal Shah

Travis Ellett

Yvette Nameth

276 من 277

London 2017

277 من 277

GTAC 2017

testing.googleblog.com