Vallari Agrawal <val.agl002@gmail.com>
Outreachy Summer Intern, 2022
Making Teuthology a Better Detective
Outreachy and GSoC
2
Outreachy Project
3
Agenda
4
An overview of teuthology workflow
Teuthology is an automation framework for Ceph.
Components:
A job consists of multiple tasks.
Collection of jobs makes a test run.
5
Problem
Current behaviour: when a unit test task fails, teuthology throws error like CommandFailure with no information about which test failed.
Due to this:
6
Error Message - Before
7
Solution
Create an opt-in feature to scan teuthology logs for unit-test errors.
8
Error Message - After
9
Benefits of the Solution
By enabling this solution, it will:
Before, it took about ~15 seconds to look for the failing tests in logs.
Npw, for a run with 20 failures, this could save 5 mins of the engineers time.
Instead of CommandFailure error, now it stores the information about the failing unit test which is more meaningful and accurate.
ErrorScanner reads a 0.5GB log file in 1.5 seconds.
10
Implementation
Example: nose test failure message start with “ERROR:” or “FAIL:”
ErrorScanner does not read above this flag index to avoid re-reading.
11
How to adopt this feature for other tests?
ERROR_PATTERN = { “nose”: [
re.compile(r"ERROR:\s"),
re.compile(r"FAIL:\s"),
],}
scan_tests_errors = [“nose”]
remote.Run(args, scan_tests_errors=scan_tests_errors)
12
Future Improvements
Right now, only implemented for nose (python’s unit test library) and gtest (c++ unit test library).
13
References
14