Welcome Back!
Developer Experience, FTW!
Niranjan Tulpule
Software development is being democratized
Core computing platforms are more accessible than ever
PCs
Smartphones & Tablets
2014
1.5M
1983
0
Free developer tools
Open Source building blocks
Free developer education
We’re lowering barrier to becoming a developer
2015
2016
2017
2018
2019
2020
2014
2013
2012
2011
Total Number of Active Apps in the App Store
2010
It’s never been easier to write apps
5M
3M
1M
Valuation of these 9 companies as a country's GDP would be in the top 50
Valuation of these companies as a country's GDP would be in the top 50
51%�Stability Issues
41%�Functionality Related
7%�Speed
1%�Other
Classification of 1-star reviews (Sampling of Play Store reviews, May 2016)
Writing high quality apps is still hard
2.5K+
100+
~700
100M
Compounded complexity
Device
Manufacturer
model
OS versions
Carriers
Permutations
Improving software quality & testability by investing in Developer Experience.
Develop
Release
Monitor
Firebase Test Lab
for Android
Test on your users’ devices
Use with your existing workflow
Ahmed to place product shot here
>
_
Android Studio
Command line
Jenkins
Jenkins logo by Charles Lowell and Frontside CC BY-SA 3.0 https://wiki.jenkins-ci.org/display/JENKINS/Logo
Robo crawls your app automatically
Create Espresso tests by just using your app
Ahmed to place product shot here
Millions of Tests, and counting!
After extensive evaluation of the market, we've found that Firebase Test Lab is the best product for writing and running Espresso tests directly from Android Studio, saving us tons of time and effort around automated testing.
- Timothy West, Jet
Actionable Results at your fingertips
Get actionable results at your fingertips
Develop
Release
Monitor
Firebase Test Lab
for Android
Play Pre-Launch Report
Pre-launch report
Pre-launch reports summarize issues found when testing your app on a wide range of devices
Apps using the Play Pre-Launch Report show ~20% fewer crashes!
~60% of the crashes seen on Pre-Launch Report are fixed before public rollout.
Actionable Results at your fingertips
Get actionable results at your fingertips
Develop
Release
Monitor
Firebase Test Lab
for Android
Play Pre-Launch Report
Firebase Crash Reporting
Firebase Crash Reporting
Get actionable insights and comprehensive analytics whenever your users experience crashes and other errors
fatal error A
6K
7K
non-fatal error A
5K
6K
fatal error B
4K
4.8K
fatal error C
3K
3K
Clustering
Get the big picture with comprehensive metrics on app versions, OS levels and device models
Find the exact line where the error happens
Minimize the time and effort to
resolve issues with data about your users’ devices
Log custom events before an error happens
//On Android
FirebaseCrash.log("Activity created.");
//On iOS
FIRCrashLog(@"Button clicked.");
Provide more context with events leading up to an error
Understand the Impact of Crashes on the Bottom Line
Confidential + Proprietary
Fix the bug, then win them back with
a timely push notification
Confidential + Proprietary
Looking ahead
Machine learning
Compilers
Toolchains
The shift to mobile caught us by surprise...
PCs
Smartphones & Tablets
2014
1.5M
1983
0
Thank You
Docker Based Geo Dispersed Test Farm �- Test Infrastructure Practice in Intel Android Program
Chen Guobing, Yu Jerry
38
Agenda
39
Taxonomies
40
Test Infrastructure Challenges
41
Test as a Service – What We Need
Anyone
Any automated Test
Any Device
Anywhere
Anytime
42
Target Users - Usages
Continuous Integration
Testing
QA
Release Testing
Developer
Testing
43
Docker Based Geo Dispersed Test Farm
44
Test Distribution
Test Catalog
Capability:
Platform:
Location:
Campaign A
capability: pmeter
“Run campaign A on XYZ platform in SH”
Test Distributor
Test Campaign ← Capability → Test Bench
45
Technical Challenges – Anywhere, Any Device
$ docker run … --device=/dev/bus/usb/001/004
--device=/dev/ttySerial0 …
46
Technical Challenges – Anyone, Any Automated Test
Release and deliver test suites in the way of docker image.
47
Questions?��Contacts: �jerry.yu@intel.com�guobing.chen@intel.com�
48
OpenHTF
an open-source hardware testing framework
https://github.com/google/openhtf
Motivation for OpenHTF
Drastically reduce the amount of boilerplate code needed to:
exercise a piece of hardware
take measurements along the way
generate a record of the whole process
Make operator interactions simple but flexible.
Allow test engineers to focus on authoring actual test logic.
“Simplicity is requisite for reliability.” ~Edsger W. Dijkstra
Google:
A Software Company
...at least, it used to be!
Google:
Now With More Hardware!
Our Solution
A python library that provides a set of convenient abstractions for authoring hardware testing code.
Use Cases
Manufacturing Floor
Automated Lab
Benchtop
Core Abstractions
Test
Plug
Test Equipment &
Device Under Test
Output Callback:
JSON to disk,
upload via network, etc.
Phase
Output
Record
Measurement
Tests & Phases
Plugs
Web GUI
Q&A
Detecting loop inefficiencies automatically
(to appear in FSE 2016)
Monika Dhok (IISc Bangalore, India)*
Murali Krishna Ramanathan (IISc Bangalore, India)
Software efficiency is very important
Performance issues are hard to detect during testing �
These issues are found even in well tested commercial softwares�
Degrade application responsiveness and user experience
Performance bugs are critical
Implementation mistakes that cause inefficiency
Difficult to catch them during compiler optimizations
Fixing them can result in large speedups, thereby improving efficiency
Redundant traversal bugs
When program iterates over a data structure repeatedly without any intermediate modifications
Public class A{
1. Public boolean containsAny(Collection c1, Collection c2){
2. Iterator itr = c1.iterator();
3. while(itr.hasNext())
4. if(c2.contains(itr.next()))
5. Return true;
6. Return false;
}
}
Complexity : O(size(c1) x size(c2))
Performance tests are written by developers
Detecting redundant traversals
Toddler [ICSE 13]
Static analysis techniques alone are not effective
Challenges :
How to confirm the validity of the bug?�
How to expose the root cause?
Execution trace can be helpful�
How to detect that the performance bug is fixed?
Automated tests not effective for performance bugs
Toddler[ICSE 13]
Challenges involved in writing performance tests
Virtual call resolution � Generating tests for all possible resolutions of method � invocation is not scalable
Generating appropriate context� Realization of the defect can be dependent on certain � conditions that affect the reachability of the inefficient loop
Arrangement of elements � Problem can only occur when data structure has large � elements arranged in particular fashion
Glider
We propose a novel and scalable approach to automatically generate tests for exposing loop inefficiencies
Glider is available online
https://drona.csa.iisc.ernet.in/~sss/tools/glider
Performance bug caught by glider
Results
We have implemented our approach on SOOT bytecode framework�and evaluated it on number of libraries
Our approach detected 46 bugs across 7 java libraries including 34 �previously unknown bugs.
Tests generated using our approach significantly outperform the �randomly generated tests.
Questions?
NEED FOR SPEED
accelerate tests from 3 hours to 3 minutes
emo@komfo.com
3
hours
3
minutes
600 API tests
Before
After
The
3 Minute
Goal
It’s not about the numbers or techniques you’ll see.
It’s all about continuous improvement.
Dedicated
Environment
Execution Time in Minutes
180
123
New Environment
Empty Databases
The time needed to create data for one test:
And then the test starts
Call 12 API endpoints
Modify data in 11 tables
Takes about 1.2 seconds
180
123
Execution Time in Minutes
89
Empty Databases
Simulate
Dependencies
+Some More
STUB
STUB
STUB
STUB
STUB
STUB
STUB
Stub all external dependencies
Core API
Transparent
Fake SSL certs
Dynamic Responses
Local Storage
Return Binary Data
Regex URL match
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
Existing Tools (March 2016)
Stubby4J
WireMock
Wilma
soapUI
MockServer
mounteback
Hoverfly
Mirage
We created project Nagual,
open source soon.
180
123
89
Execution Time in Minutes
65
Stub Dependencies
Move to Containers
180
123
89
65
Execution Time in Minutes
104
Using Containers
Run Databases
in Memory
180
123
89
65
104
Execution Time in Minutes
61
Run Databases in Memory
Don’t Clean
Test Data
180
123
89
65
104
61
Execution Time in Minutes
46
Don’t delete test data
Run in Parallel
4 6 8 10 12 14 16
Time to execute 12 9 7 5 8 12 17
The Sweet Spot
180
123
89
65
104
61
46
Execution Time in Minutes
5
Run in Parallel
Equalize Workload
180
123
89
65
104
61
46
5
Execution Time in Minutes
3
Equal Batches
Run in Parallel
Don’t delete test data
Run Databases in Memory
Using Containers
Stub Dependencies
Empty Databases
New Environment
After Hardware Upgrade
The Outcome
2:15 min.
1:38 min.
The tests are slow
The tests are unreliable
The tests can’t exactly pinpoint the problem
High Level Tests Problems
3 Minutes
No external dependencies
It’s cheap to run all tests after every change
In a couple of years, running all your automated tests, after every code change, for less than 3 minutes, will be standard development practice.
Recommended Reading
EmanuilSlavov.com
@EmanuilSlavov
Slide #, Photo Credits
1. https://www.flickr.com/photos/thomashawk
5. https://www.flickr.com/photos/100497095@N02
7. https://www.flickr.com/photos/andrewmalone
10. https://www.flickr.com/photos/astrablog
14. https://www.flickr.com/photos/foilman
16. https://www.flickr.com/photos/missusdoubleyou
18. https://www.flickr.com/photos/canonsnapper
20. https://www.flickr.com/photos/anotherangle
23. https://www.flickr.com/photos/-aismist
Code Coverage is a Strong Predictor of
Test Suite Effectiveness
in the Real World
Rahul Gopinath
Iftekhar Ahmed
When should we stop testing?
How to evaluate test suite effectiveness?
Previous research: Do not trust coverage
(In theory)
GTAC’15 Inozemtseva
Factors affecting test suite quality
Test suite quality
Coverage
Assertions
According to previous research
Test suite quality
Coverage
Assertions
Test suite size
GTAC’15 Inozemtseva
But...
What is the adequate test suite size?
Can I use coverage to measure
suite effectiveness?
Statement coverage best predicts mutation score
A fault in a statement has 87% probability of being detected
if an organic test covers it.
M = 0.87xS
Size of dots follow size of projects
R2 = 0.94
Results from 250 real world programs
largest > 100 KLOC
On Developer written test suites
Statement coverage best predicts mutation score
A fault in a statement has 61% probability of being detected
if a generated test covers it.
M = 0.61xS
Size of dots follow size of projects
R2 = 0.70
Results from 250 real world programs
largest > 100 KLOC
On Randoop generated test suites
But
Controlling for test suite size, coverage provides little extra information.
Hence don't use coverage [GTAC’15 inozemtseva]
Why use mutation?
Mutation score provides little extra information (<6%) compared to coverage.
Does coverage have no extra value?
| GTAC’15 Inozemtseva | Our Research |
# Programs | 5 | 250 |
Selection of programs | Ad hoc | Systematic sample from Github |
Tool used | CodeCover, PIT | Emma, Cobertura, CodeCover, PIT |
Test suites | Random subsets of original | Organic & Randomly generated |
| | |
(New results) | | |
Removal of influence of size | Ad hoc | Statistical |
Our study is much larger, systematic (not ad hoc), and follows the real world usage
| Our Research (New results) |
M~TestsuiteSize | 12.84% |
M~log(TSize) | 51.26% |
residuals(M~log(TSize))~S | 75.25% |
Statement coverage can explain 75% variability in mutation score after eliminating influence of test suite size.
Is mutation analysis better than coverage analysis?
Mutation analysis: High cost of analysis
Δ=b2 – 4ac
d = b^2 + 4 * a * c;�d = b^2 * 4 * a * c;�d = b^2 / 4 * a * c;�d = b^2 ^ 4 * a * c;�d = b^2 % 4 * a * c;
d = b^2 << 4 * a * c;
d = b^2 >> 4 * a * c;
d = b^2 * 4 + a * c;�d = b^2 * 4 - a * c;�d = b^2 * 4 / a * c;�d = b^2 * 4 ^ a * c;�d = b^2 * 4 % a * c;
d = b^2 * 4 << a * c;
d = b^2 * 4 >> a * c;
d = b^2 * 4 * a + c;�d = b^2 * 4 * a - c;�d = b^2 * 4 * a / c;�d = b^2 * 4 * a ^ c;�d = b^2 * 4 * a % c;
d = b^2 * 4 * a << c;
d = b^2 * 4 * a >> c;
d = b + 2 - 4 * a * c;�d = b - 2 - 4 * a * c;�d = b * 2 - 4 * a * c;�d = b / 2 - 4 * a * c;�d = b % 2 - 4 * a * c;
d = b^0 - 4 * a * c;�d = b^1 - 4 * a * c;
d = b^-1 - 4 * a * c;
d = b^MAX - 4 * a * c;
d = b^MIN - 4 * a * c;
d = b^2 - 0 * a * c;�d = b^2 - 1 * a * c;�d = b^2 – (-1) * a * c;�d = b^2 - MAX * a * c;�d = b^2 - MIN * a * c;�
Mutation score is very costly
Mutation analysis: Equivalent mutants
Δ=b2 – 22ac
d = b^2 - (2^2) * a * c;�d = b^2 - (2*2) * a * c;�d = b^2 - (2+2) * a * c;
Mutants
Original
Equivalent Mutant
Normal Mutant
Or: Do not trust low mutation scores
Low mutation score does not indicate a low quality test suite.
Mutation analysis: Equivalent mutants
Δ=b2 – 22ac
d = b^2 - (-4) * a * c;�d = b^2 + 4 * a * c;�d = (-b)^2 - 4 * a * c;
Mutants
Original
Equivalent Mutant
Redundant Mutant
Or: Do not trust low mutation scores
High mutation score does not indicate a high quality test suite.
Mutation Analysis: Different Operators
Δ=b2 – 4ac
d = b^2 + 4 * a * c;
>>> dis.dis(d)
2 0 LOAD_FAST 0 (b)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (4)
9 LOAD_FAST 1 (a)
12 BINARY_MULTIPLY
13 LOAD_FAST 2 (c)
16 BINARY_MULTIPLY
17 BINARY_SUBTRACT
18 BINARY_XOR
19 RETURN_VALUE x
[2016 Software Quality Journal]
Mutation score is not a consistent measure
Does a high coverage test suite
actually prevent bugs?
We looked at bugfixes on actual programs
An uncovered line is twice as likely to have a bug fix
as that of a line covered by any test case.
[FSE 2016] | Covered | Uncovered | p |
Statement | 0.68 | 1.20 | 0.00 |
Block | 0.42 | 0.83 | 0.00 |
Method | 0.40 | 0.87 | 0.00 |
Class | 0.45 | 0.32 | 0.10 |
Difference in bug-fixes between covered and
Uncovered program elements
Does a high coverage test suite
actually prevent bugs?
Yes it does
Summary
Do not dismiss coverage lightly
Beware of mutation analysis caveats
Coverage is a pretty good heuristic on where the bugs hide.
Assume non-equivalent, non-redundant, uniform fault distribution for mutants
at one’s own peril.
Beware of theoretical spherical cows…
Backup slides
That is,
Mutation X Path Coverage
Mutation X Branch Coverage
Computations
require(Coverage)
data(o.db)
o <- subset(subset(o.db, tloc != 0), select=c('pit.mutation.cov', 'cobertura.line.cov', 'loc', 'tloc'))
o$l.tloc <- log2(o$tloc)
oo <- subset(o, l.tloc != -Inf)
ooo <- na.omit(oo)
> cor.test(pit.mutation.cov,tloc)
t = 1.973, df = 232, p-value = 0.04969
95 percent confidence interval: 0.0002148688 0.2525430013
sample estimates: cor 0.1284574
> cor.test(pit.mutation.cov,l.tloc)
data: pit.mutation.cov and l.tloc
t = 9.0938, df = 232, p-value < 2.2e-16
95 percent confidence interval: 0.4114269 0.6013377
sample estimates: cor 0.5126249
> cor.test(resid(lm(pit.mutation.cov~log(tloc))),cobertura.line.cov)
data: resid(lm(pit.mutation.cov ~ log(tloc))) and cobertura.line.cov
t = 17.406, df = 232, p-value < 2.2e-16
95 percent confidence interval: 0.6909857 0.8032663
sample estimates: cor 0.7525441
> summary(lm(pit.mutation.cov~log(tloc)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.13644 0.06031 -2.262 0.0246 *
log(tloc) 0.09950 0.01094 9.094 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2839 on 232 degrees of freedom
Multiple R-squared: 0.2628, Adjusted R-squared: 0.2596
F-statistic: 82.7 on 1 and 232 DF, p-value: < 2.2e-16
> summary(lm(pit.mutation.cov~log(tloc)+cobertura.line.cov))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.074859 0.031645 -2.366 0.018828 *
log(tloc) 0.023658 0.006487 3.647 0.000328 ***
cobertura.line.cov 0.785488 0.031628 24.836 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1485 on 231 degrees of freedom
Multiple R-squared: 0.7991, Adjusted R-squared: 0.7974
F-statistic: 459.5 on 2 and 231 DF, p-value: < 2.2e-16
Does Mutation score correlate to fixed bugs?
Mutant semiotics (how faults map to failures) is not well understood
Affected by factors of the particular project
Can weak mutation analysis help?
Rather than the failure of a test case for a mutant, we only require a change in state. It is easier to compute, but:
Method
250 real world projects from Github, largest > 100 KLOC.
Tests
Developer written
Randoop generated
| Statement | Branch | Path | Mutation |
Emma | X | | | |
Cobertura | X | X | | |
Codecover | X | X | | |
JMockit | X | | X | |
PIT | X | | | X |
Major | | | | X |
Judy | | | | X |
Mutation analysis has a number of other problems
The fault distribution may not be uniform
A majority of mutants are very easy to kill, but some are stubborn.
Does two test suites with say 50% mutation score have the same strength?
Testsuites optimized for harder to detect faults are penalized.
Correlation does not imply causation?
It was pointed out in the previous talk that correlation between coverage and mutation score does not imply a causal relationship between the two. We can counter it by:
Logic
A test suite with zero coverage will not kill any mutants.
A test suite can only kill mutants on the lines it covers.
Statistically
Using additive noise models to identify cause and effect. (ongoing research)
ClusterRunner
Making fast test-feedback easy through horizontal scaling.
Joseph Harrington and Taejun Lee Productivity Engineering
What is ClusterRunner?
Functional
Tests
Integration Tests
Unit Tests
Manual
Tests
Develop
Test
Feature
Design
Release
Develop
Test
Feature
Design
Release
PHPUnit testsuite duration at Box
“A problem isn’t a problem if you can throw money at it.”
PHPUnit
Scala SBT
nosetests
QUnit
JUnit
Requirements
Easy to configure and use
Test technology agnostic
Fast test feedback
www.ClusterRunner.com
Our 30-hour testsuite
17
minutes
ClusterRunner in Action
Bring up a Cluster
�# On master.box.com�clusterrunner master � --port 43000
# On slave1.box.com, slave2.box.com�clusterrunner slave � --master-url master.box.com:43000
Bring up a Cluster
http://master.box.com:43000/v1/slave/
Set up Your Project
Set up Your Project
�> phpunit ./test/php/EarthTest.php
> phpunit ./test/php/WindTest.php
> phpunit ./test/php/FireTest.php
> phpunit ./test/php/WaterTest.php
> phpunit ./test/php/HeartTest.php
Execute a Build
Now we’re ready to build!
�clusterrunner build� --master-url master.box.com:43000
git � --url http://github.com/myproject� --job-name PHPUnit
View Build Results
http://master.box.com:43000/v1/build/1/
View Build Results
http://master.box.com:43000/v1/build/1/subjob/
View Build Results
http://master.box.com:43000/v1/build/1/result
What’s next for ClusterRunner
clusterrunner.com
Get Involved!
productivity@box.com
Contact Us
Multi-device Testing
E2E test infra for mobile products of today and tomorrow
angli@google.com
adorokhine@google.com
Overview
E2E testing challenges
Introducing Mobly
Sample test
Controlling Android devices
Custom controller
Demo
E2E Testing
Unit Tests
Integration/Component Tests
E2E Tests
Testing Pyramid
Where magic dwells
E2E Testing is Important
Applications involving multiple devices
P2P data transfer, nearby discovery
Product under test is not a conventional device.
Internet-Of-Things, VR
Need to control and vary physical environment
RF: Wi-Fi router, attenuators
Lighting, physical position
Interact with other software/cloud services
iPerf server, cloud service backend, network components
E2E Testing is Hard!
Most test frameworks are for single-device app testing
Need to trigger complex actions on devices
Some may need system privilege
Need to synchronize steps between multiple devices
Logic may be centralized (hard to write) or decentralized (hard to trigger)
Need to drive a wide range of equipment
attenuator, call box, power meter, wireless AP etc
Need to communicate with cloud services
Need to collect debugging artifacts from many sources
Our Solution - Mobly
Lightweight Python framework (Py2/3 compatible)
Test logic runs on a host machine
Controls a collection of devices/equipment in a test bed
Bundled with controller library for essential equipment
Android device, power meter, etc
Flexible and pluggable
Custom controller module for your own toys
Open source and ready to go!
Mobly Architecture
Test Bed
Computer
Mobly
Test Script
Mobile
Device
Network Switch
Attenuator
Call Box
Cloud Service
Test Harness
Test bed allocation, device provisioning, and results aggregation
Sample Tests
Hello from the other side
HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO
HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLo�HELLO�HELLO�HELLO�HELLO
HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO
HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO
HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO�HELLO
Describe a Test Bed
{� 'testbed': [{� 'name': 'SimpleTestBed',� 'AndroidDevice': '*'� }],� 'logpath': '/tmp/mobly_logs'�}
Test Script - Hello!
from mobly import base_test�from mobly import test_runner��class HelloWorldTest(base_test.BaseTestClass):� def setup_class(self):� self.ads = self.register_controller(android_device)� self.dut1 = self.ads[0]�� def test_hello_world(self):� self.dut1.sl4a.makeToast('Hello!')
if __name__ == '__main__':� test_runner.main()
Invocation:�$ ./path/to/hello_world_test.py -c path/to/config.json
Beyond the Basics
Config:�{� 'testbed': [{� ...� }],� 'logpath': '/tmp/mobly_logs',� 'toast_text': 'Hey there!'�}
�Code:
self.user_params['toast_text'] # 'Hey there!'
Beyond the Basics
Device specific logger
self.caller.log.info("I did something.")�# <timestamp> [AndroidDevice|<serial>] I did something
In test bed config:�'AndroidDevice': [{'serial': 'xyz', 'label': 'caller'},� {'serial': 'abc', 'label': 'callee',� 'phone_number': '123456'}]
In code:�self.callee = android_device.get_device(self.ads, label='callee')�self.callee.phone_number # '123456'
Specific device info
Controlling Android Devices
adb/shell
UI
API Calls
Custom Java Logic
Controlling Android Devices
adb
ad.adb.shell('pm clear com.my.package')
UI automator
ad.uia = uiautomator.Device(serial=ad.serial)
ad.uia(text='Hello World!').wait.exists(timeout=1000)
Android API calls, including system/hidden APIs, via SL4A
ad.sl4a.wifiConnect({'SSID': 'GoogleGuest'})
Custom Java logic
ad.register_snippets('trigger', 'com.my.package.snippets')
ad.trigger.myImpeccableLogic(5)
System API Calls
> self.dut.sl4a.makeToast('Hello World!')
SL4A (Scripting Layer for Android) is an RPC service exposing API calls on Android
self.dut.api is the RPC client for SL4A.
Original version works on regular Android builds.
Fork in AOSP can make direct system privileged calls (system/hidden APIs).
Custom Snippets
SL4A is not sufficient
SL4A methods are mapped to Android APIs, but tests need more than just Android API calls.
Current AOSP SL4A requires system privilege
Custom snippets allows users to define custom method that does anything they want.
Custom snippets can be used with other useful libs like Espresso
Custom Snippets
package com.mypackage.testing.snippets.example;
public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Returns a string containing the given number.')� public String getFoo(Integer input) {� return 'foo ' + input;� }�� @Override� public void shutdown() {}�}
Custom Snippets
Add your snippet classes to AndroidManifest.xml for the androidTest apk
<meta-data� android:name='mobly-snippets'� android:value='com.my.app.test.MySnippet1,� com.my.app.test.MySnippet2' />
Compile it into an apk
apply plugin: 'com.android.application'� dependencies {� androidTestCompile 'com.google.android.mobly:snippetlib:0.0.1'� }
Custom Snippets
Install the apk on your device
Load and call it
ad.load_snippets(name='snippets',� package='com.mypackage.testing.snippets.example')�foo = ad.snippets.getFoo(2) # 'foo 2'
Espresso in Custom Snippets
import static android.support.test.espresso.Espresso.onView;�import static android.support.test.espresso.action.ViewActions.swipeUp;�import static android.support.test.espresso.matcher.ViewMatchers.withId;
public class ExampleSnippet implements Snippet {� public ExampleSnippet(Context context) {}�� @Rpc(description='Performs a swipe using espresso')� public void performSwipe() {� onView(withId(R.id.my_view_id)).perform(swipeUp());� }�}
Custom Controllers
Plug in your own toys
Loose Controller Interface
def create(configs):� '''Instantiate controller objects'''��def destroy(objects):� '''Destroy controller objects'''
def get_info(objects):� '''[optional] Get controller info for test summary'''
Using Custom Controllers
from my.project.testing.controllers import car��def setup_class(self):� self.cars = self.register_controller(car)��def test_something(self):� self.cars[0].drive()
Video Demo
Video Demo
Coming Soon
iOS controller libs
Dependent on libimobiledevice
KIFTest, XCTest, XCUITest
Async events in snippets
Standard snippet and python utils for basic Android operations
Support non-Nexus Android devices
Thank You!
Questions?
Scale vs Value
Test Automation at the BBC
David Buckhurst & Jitesh Gosai
Lots of innovation
Chair hive
Live
Insights
&
Operational
Notifications
Scale vs Value
www.bbc.co.uk/opensource
@BBCOpenSource
@davidbuckhurst @JitGo
Finding bugs in
C/C++ libraries using
libFuzzer
Kostya Serebryany, GTAC 2016
Agenda
What is Fuzzing
Why fuzz
What to fuzz
How to fuzz
Fuzz Target - a C/C++ function worth fuzzing
extern "C"
int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {
if (DataSize >= 3 &&
Data[0]=='F' &&
Data[1]=='U' &&
Data[2]=='Z' &&
Data[3]=='Z')
DoMoreStuff(Data, DataSize);
return 0;
}
libFuzzer - an engine for guided in-process fuzzing
% clang++ -g my-code.cc libFuzzer.a -o my-fuzzer \
-fsanitize=address -fsanitize-coverage=trace-pc-guard
% ./my-fuzzer MY_TEST_CORPUS_DIR
CVE-2016-5179 (c-ares, asynchronous DNS requests)
extern "C"
int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t DataSize) {
unsigned char *buf; int buflen;
std::string s(reinterpret_cast<const char *>(Data), DataSize);
ares_create_query(s.c_str(), ns_c_in, ns_t_a, 0x1234, 0, &buf,
&buflen, 0);
free(buf);
return 0;
}
present perfect => present continuous
Oss-fuzz - fuzzing as a service for OSS
Based on ClusterFuzz, the fuzzing backend used for fuzzing Chrome components
Supported engines: libFuzzer, AFL, Radamsa, ...
Q&A
Can MongoDB Recover from Catastrophe?
How I learned to crash a server
{ name : "Jonathan Abrahams",
title : "Senior Quality Engineer",
location : "New York, NY",
twitter : "@MongoDB",
facebook : "MongoDB" }
A machine may crash for a variety of reasons:
Machine crash
Unexpected termination of mongod
Application crash
Why do we need to crash a machine?
We could abort mongod, but this would not fully simulate an unexpected crash of a machine or OS (kernel):
Immediate loss of power may prevent cached I/O from being flushed to disk.
A kernel panic can leave an application (and its data) in an unrecoverable state.
System passes h/w & s/w checks
mongod goes into recovery mode
mongod ready for client connection
System restart
How can we crash a machine?
We started by crashing the machine manually, by pulling the cord.
We evolved to using an appliance timer, which would power the machine off/on every 15 minutes.
We also figured out that setting up a cron job to send an internal crash command (more on this later) to the machine for a random period would do the job.
And then we realized, we need to do it a bit more often.
How did we really crash that machine, and can we do over and over and over and over...?
Why do we need to do it over and over and over?
A crash of a machine may be catastrophic. In order to uncover any subtle recovery bugs, we want to repeatedly crash a machine and test if it has recovered. A failure may only be encountered 1 out of 100 times!
Ubiquiti mPower PRO to the rescue!
Programmable power device, with ssh access from LAN via WiFi or Ethernet.
How do we turn off and on the power?
ssh admin@mpower
local outlet="output1"
# Send power cycle to mFi mPower to specified outlet
echo 0 > /dev/$outlet
sleep 10
echo 1 > /dev/$outlet
Physical vs. Virtual
It is necessary to test both type of machines as machine crashes are different and the underlying host OS and hardware may provide different I/O caching and data protection. Virtual machines typically rely on shared resources and physical machines typically use dedicated resources.
How do we crash a virtual machine?
We can crash it from the VM host:
KVM (Kernel-based VM): virsh destroy <vm>
VmWare: vmrun stop <vm> hard
How do we restart a crashed VM?
We can restart it from the VM host:
KVM (Kernel-based VM): virsh start <vm>
VmWare: vmrun start <vm>
How else can we crash a machine?
We can crash it using the magical SysRq key sequence (Linux only):
echo 1 | sudo tee /proc/sys/kernel/sysrq
echo b | sudo tee /proc/sysrq-trigger
How do we get the machine to restart?
Enable the BIOS setting to boot up after AC power is provided.
Restarting a Windows Machine
To disable a Windows machine from prompting you after unexpected shutdown:
bcdedit /set {default} bootstatuspolicy ignoreallfailures
bcdedit /set {current} bootstatuspolicy ignoreallfailures
bcdedit /timeout 5
The machine is running
Now that we figured out how to get our machine to crash and restart, we restart the mongod and it will go into recovery mode.
Recovery mode of mongod
Performed automatically when mongod starts, if there was an unclean shutdown detected.
WiredTiger starts from the last stable copy of the data on disk from the last checkpoint. The journal log is then applied and a new checkpoint is applied.
Before the crash!
Stimulate mongod by running several simultaneous (mongo shell) clients which provide a moderate load utilizing nearly all supported operations. This is important, as CRUD operations will cause mongod to perform I/O operations, which should never lead to file or data corruption.
Options, options
Client operations optionally provide:
Checkpoint document
Write & Read concerns
The mongod process is tested in a variety modes, including:
Standalone or single node replica set
Storage engine, i.e., mmapv1, wiredTiger
What do we do after mongod has restarted?
After the machine has been restarted, we start mongod on a private port and it goes into recovery mode. Once that completes, we perform further client validation, via mongo (shell):
serverStatus
Optionally, run validate against all databases and collections
Optionally, verify if a checkpoint document exists
Failure to recover, connect to mongod, or perform the other validation steps is considered a test failure.
What do we do after mongod has restarted?
Now that the recovery validation have passed, we will proceed with the pre-crash steps:
Stop and restart mongod on a public port
Start new set of (mongo shell) clients to perform various DB operations
Why do we care about validation?
The validate command checks the structures within a namespace for correctness by scanning the collection’s data and indexes. The command returns information regarding the on-disk representation of the collection.
Failing validation indicates that something has been corrupted, most likely due to an incomplete I/O operation during the unexpected shutdown.
Failure analysis
Since our developers could be local (NYC) or worldwide (Boston, Sydney), we want a self-service application they can use to reproduce reported failures. A bash script has been developed which can execute on both local hardware and in the cloud (AWS).
We save any artifacts useful for our developers to be able to analyze the failure:
Backup data files before starting mongod
Backup data files after mongod completes recovery
mongod and mongo (shell) log files
The crash testing helped to:
Extend our testing to scenarios not previously covered
Provide local and remote teams with tools to reproduce and analyze failures
Improve robustness of the mongod storage layer
Results, results
Storage engine bugs were discovered from the power cycle testing and led to fixes/improvements.
We have plans to incorporate this testing into our continuous integration.
Some bugs discovered
SERVER-20295 Power cycle test - mongod fails to start with invalid object size in storage.bson
SERVER-19774 WT_NOTFOUND: item not found during DB recovery
SERVER-19692 Mongod failed to open connection, remained in hung state, when running WT with LSM
SERVER-18838 DB fails to recover creates and drops after system crash
SERVER-18379 DB fails to recover when specifying LSM, after system crash
SERVER-18316 Database with WT engine fails to recover after system crash
SERVER-16702 Mongod fails during journal replay with mmapv1 after power cycle
SERVER-16021 WT failed to start with "lsm-worker: Error in LSM worker thread 2: No such file or directory"
Open issues?
Can we crash Windows using an internal command (cue the laugh track…)?
Closing remarks
Organizing committee
Alan Myrvold
Amar Amte
Andrea Dawson
Ari Shamash
Carly Schaeffer
Dan Giovannelli
David Aristizabal
Diego Cavalcanti
Jaydeep Mehta
Joe Drummey
Josephine Chandra
Kathleen Li
Lena Wakayama
Lesley Katzen
Madison Garcia
Matt Lowrie
Matthew Halupka
Sonal Shah
Travis Ellett
Yvette Nameth
London 2017
GTAC 2017
testing.googleblog.com