Automated Test Input Generation for Android: �Are We There Yet?
Shauvik Roy Choudhary (Georgia Tech)
Alessandra Gorla (IMDEA Software Institute, Spain)
Alessandro Orso (Georgia Tech)
Partly supported by NSF, MSR, IBM Research, Google
Apps on the playstore
83% smartphones
worldwide are
Android based
Automated Test Input Generation Techniques
Dynodroid
FSE’13
A3E�OOPSLA’13
SwiftHand
OOPSLA’13
DroidFuzzer
MoMM’13
Orbit
FASE’13
Monkey
2008
ACTEve
FSE’12
GUIRipper
ASE’12
JPF-Android
SENotes’12
PUMA
Mobisys’14
EvoDroid
FSE’14
IntentFuzzer
WODA’14
Null IntentFuzzer
2009
Tools Strategies
Randomly selects an event for exploration
Tools: Monkey, Dynodroid
Advantages
Drawbacks
2. Model-based Exploration Strategy
Use GUI model of the app to systematically explore
Typically FSMs (states = activities, edges = events)
Tools: A3E-DF, SwiftHand, GUIRipper, PUMA
Advantages
Drawbacks
3. Systematic Exploration Strategy
Use sophisticated techniques (e.g., symbolic execution, evolutionary algorithms) to systematically explore the app
Tools: ACTEve, EvoDroid
Advantages
Drawbacks
(xleft < $x < xright) ∧ (ytop < $y < ybottom)
SAT Solver
$x = 5; $y = 10
Automated Test Input Generation Techniques
Name | Doesn’t need Instrumentation | Events | Exploration Strategy | Testing Strategy | ||
| Platform | App | UI | System | | |
Monkey | ✔ | ✔ | ✔ | ✖ | Random | Black-box |
ACTEve | ✖ | ✖ | ✔ | ✔ | Systematic | White-box |
Dynodroid | ✖ | ✔ | ✔ | ✔ | Random | Black-box |
A3E-DF | ✔ | ✖ | ✔ | ✖ | Model-based | Black-box |
SwiftHand | ✔ | ✖ | ✔ | ✖ | Model-based | Black-box |
GUIRipper | ✔ | ✖ | ✔ | ✖ | Model-based | Black-box |
PUMA | ✔ | ✔ | ✔ | ✖ | Model-based | Black-box |
Evaluation
Image Credit: Daily Alchemy
Evaluation Criteria
Mobile App Benchmarks
Combination of all subjects (68) used from F-Droid and other open source repos
Experimental Setup
Debian Host
Ubuntu Guest
2 cores
6GB RAM
VirtualBox
Vagrant
Android Emulators
4GB RAM
�Emulators: �v2.3 (Gingerbread)�v4.1 (Jelly Bean)�v4.4 (KitKat)
�Tools installed on guest:
Experiment Protocol
Emma HTML Reports
Parse and extract statement coverage
Logcat �from device
Parse and extract unique stack traces (RegEx)
Results
Image Credit: ToTheWeb
C1. Ease of Use & C2. Android Compatibility
Name | Ease of Use | Compatibility | |
| | OS | Emulator/Device |
Monkey | NO_EFFORT | Any | Any |
ACTEve | MAJOR_EFFORT | v2.3 | Emu (Custom) |
Dynodroid | NO_EFFORT | v2.3 | Emu (Custom) |
A3E-Depth-first | LITTLE_EFFORT | Any | Any |
SwiftHand | MAJOR_EFFORT | v4.1+ | Any |
GUIRipper | MAJOR_EFFORT | Any | Emulator |
PUMA | LITTLE_EFFORT | v4.3+ | Any |
C3. Overall Code Coverage Achieved
C3. Coverage Analysis by Benchmark App
Divide And Conquer
Random�MusicPlayer
k9mail
Password�MakerPro
...
#Applications
% Coverage
C3. Code Coverage Achieved Over Time
C4. Fault Detection Ability
Pairwise Comparison: Coverage and Failures
Coverage
Pairwise Comparison: Coverage and Failures
Failures
Pairwise Comparison: Coverage and Failures
Coverage
Failures
Observations and Discussion
1.
Random testing can be effective
(somehow surprisingly)
2.
Strategy makes a difference
(in the behaviors covered)
3.
System events matter
(in addition to UI events)
Broadcast Receiver
Intents
SMS
Notifications
4.
Restarts should be minimized
(for efficient exploration)
5.
Practical considerations matter
(for practical usefulness)
5.1
Practical considerations matter
(for practical usefulness)
Manual Inputs
5.2
Practical considerations matter
(for practical usefulness)
Initial State
Open Issues for Future work
Image Credits: Back to the Future (Universal Pictures)
1.
Reproducibility
(allow for reproducing
observed behaviors)
Image Source: http://ncp-e.com
2.
Mocking
and
Sandboxing
(support reproducibility, avoid side effects, ease testing)
Source: http://googletesting.blogspot.com
3.
Find problems across platforms
(address fragmentation)
Image Credit: OpenSignal
Infrastructure
http://www.cc.gatech.edu/~orso/software/androtest