Reproducible and replicable Deep Learning
Tracking and automating experiments
Erik Ylipää
Linköping University
AIDA Data Hub AI Support Lead
National Bioinformatics Infrastructure Sweden (NIBS)
SciLifeLab
Goal - minimal manual experimentation
MLOps level 0: Manual process
MLOps: Continuous delivery and automation pipelines in machine learning
https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
MLFlow
Experiment tracking
API for tracking parameters, metrics, models, images, tables, artifacts and more
Automatic logging for many frameworks (but we will use the manual logging)
Comes with graphical viewer
Easy to analyze parameters, follow training progress and find logged artifacts
MLFlow - much more
Handling project dependencies and containerization of model
Model server - easy to build prediction services
Model tracking
Other tools
Reproducibility in deep learning
Which kind of reproducibility would you prefer for machine learning experiments?
Getting the exact same results in machine learning is like watching a recording of an experiment
Good as a verification
But has the watcher reproduced the experiment in a meaningful way?
Useful machine learning results should give similar performance on my data!*
Wait, it was data all along?
Always has been.
*assuming my task is similar to the one I’m replicating
Reproducibility is too weak,
the goal should be replicability
You only have access to limited data, but you should strive to do as much with it as possible
With clinical data, we rarely have the luxury of external test sets but should try to make the most of the ones we have
Wait, it was data all along?
Always has been.
Why use canonical test sets?
Human ML researchers overfitting to the test set, inspired by Hieronymus Bosch
Recipe for sampling
Nested cross validation
Machine learning results should replicate independent of which human runs the experiment!
Grad student searching for the optimal hyper parameters by manually - Grad Student Descent
Assume that the method will need different hyper parameters for different datasets. These should not be discovered manually.
Hyper Parameter Optimization
Shahriari, Bobak, et al. "Taking the human out of the loop: A review of Bayesian optimization." Proceedings of the IEEE 104.1 (2015): 148-175.
Taking it to the extreme
Is it worth it?
What hyper parameters to search over
The random weight initialization
Åkesson, Julius, Johannes Töger, and Einar Heiberg. "Random effects during training: Implications for deep learning-based medical image segmentation." Computers in Biology and Medicine 180 (2024): 108944.
We need to talk about random seeds
Bethard, Steven. "We need to talk about random seeds." arXiv preprint arXiv:2210.13393 (2022).
In summary
Thank you!
erik.ylipaa@scilifelab.se
Scientific value:
Scientific value:
What makes a good algorithm?
Abstract: In this work we present a novel algorithm which can sort a list in O(1) time, this is considerably faster than existing algorithms which sorts lists in O(N log N) for comparison-based algorithm or O(N) for index-based sorting. We demonstrate the performance of our algorithm on the commonly used range(10) benchmark.
def constant_sort(l):
return [0,1,2,3,4,5,6,7,8,9]
What is best?