TRIC: Traceability and Reproducibility through Individual Containerization
Dominic Kennedy∗, Paula Olaya∗, Jay Lofstead†, Rodrigo Vargas‡, Michela Taufer∗
∗ University of Tennessee, Knoxville, TN, USA
† Sandia National Laboratories, Albuquerque, NM, USA
‡ University of Delaware, Newark, DE, USA
1
NSDF All-Hands meeting October 2022
Trustworthiness in Scientific Workflows
Computational workflows play a key role in scientific discovery. These workflows are growing more complex
For scientists using these workflows to aid in research, trusting data, methods, software, and hardware becomes more necessary than ever
2
Scientific Workflow
Heterogeneous infrastructure
Scientist(s)
Data preprocessing
Data collection
Machine learning model
Data visualization suite
Traceability and Explainability
Scientists achieve trust when they can
3
03
01
02
04
05
06
07
08
09
11
Data
App
10
12
Containerized Scientific Workflows
4
Containerized Scientific Workflows
5
Fine Grained Containerization for Workflow Trust
6
Plugin for Fine Grained �Containerized Workflows
We develop a Singularity/Apptainer plugin that extends the Singularity/Apptainer runtime to support fine-grained containerized workflows
Our plugin transforms a monolithic workflow into a collection of fine-grained containers hosting applications and data separately which enables
7
Augmented Functionalities
workflow --create
8
Fine Grained Workflow Creation
9
Our fine-grained containerized environment provides traceability and explainibility
A data container follows a file-system-in-a-file model and includes an individual dataset (i.e., input, intermediary, or output data)
The application container includes the executable or script with the respective software stack (i.e., OS, libraries, and packages)
Fine Grained Workflow Creation
10
Our fine-grained containerized environment provides traceability and explainibility
A data container follows a file-system-in-a-file model and includes an individual dataset (i.e., input, intermediary, or output data)
The application container includes the executable or script with the respective software stack (i.e., OS, libraries, and packages)
The execution metadata exposes: unique hash code (UUID), container name, creation time, command line and record trail
Augmented Functionalities
workflow -- exec
11
Fine Grained Workflow Execution
12
Running an Earth Science Workflow
13
[1] Paula Olaya, Dominic Kennedy, Ricardo Llamas, Leobardo Valera, Rodrigo Vargas, Jay Lofstead, and Michela Taufer, “Building Trust in Earth Science Findings through Data Traceability and Results Explainability”. IEEE Transactions on Parallel Distributed Systems (TPDS).
Example metadata from real workflow
Check Out Our Repository!
14
github.com/TauferLab/ContainerizedEnv