Reproducible Data Science in the Cloud
Outline
@dwhitena, @pachydermIO, @RPILally
Why do we care about Reproducibility?
@dwhitena, @pachydermIO, @RPILally
How can we achieve Reproducibility?
(at scale, in the cloud)
@dwhitena, @pachydermIO, @RPILally
How can we achieve Reproducibility?
(at scale, in the cloud)
@dwhitena, @pachydermIO, @RPILally
Demo
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
train.R
model.rda
model.txt
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
train.R
infer.R
model.rda
model.txt
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
train.R
infer.R
1.csv
1.3,1.4,...
1
setosa
model.rda
model.txt
@dwhitena, @pachydermIO, @RPILally
iris.csv
1.3,1.4,...
train.R
infer.R
1.csv
1.3,1.4,...
1
setosa
model.rda
model.txt
@dwhitena, @pachydermIO, @RPILally
… enter Pachyderm
An open source, distributed processing and data versioning framework built on containers.
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
1.csv
inference
1
Running train.R
iris.csv
inference
Running infer.R
model.rda
model.txt
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
1.csv
inference
1
Running train.R
iris.csv
Inference 1
model.rda
model.txt
Inference 2
Inference N
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
inference
inference
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
inference
inference
plots
plots
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
inference
inference
plots
plots
raw_data
training
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
inference
inference
plots
plots
raw_data
training
raw_attr
attributes
#!/bin/bash
@dwhitena, @pachydermIO, @RPILally
Pachyderm
training
model
model
attributes
inference
inference
plots
plots
raw_data
training
raw_attr
attributes
attributes
attributes
inference
training
@dwhitena, @pachydermIO, @RPILally
Conclusion/Resources
@dwhitena, @pachydermIO, @RPILally