There and back again: a short introduction to virtualization technologies
Python for Psychologists - Winter term 2022
Peer Herholz (he/him)
Research affiliate - NeuroDataScience-ORIGAMI lab at MNI, MIT, McGill & BRAMS
Member - BIDS, ReproNim, Brainhack, UNIQUE, CNeuroMod
@peerherholz
28/10/2022
Michael Ernst
Phd student - Neurocognitive Psychology at Goethe University Frankfurt
M-earnest
Recap of the last session
Recap of the last session
GUI vs. CLI - an example
https://giphy.com/gifs/colbertlateshow-stephen-colbert-surprise-late-show-l0HlO3BJ8LALPW4sE
We will dive right in!
Everyone gets 5 min to do the following:
“addams_family/pugsley”
“addams_family/gomez”
“addams_family/fester”
New session - old and new “friends”
Open a shell/terminal.
How do you check where you currently are wrt paths on your machine?
How can one check the contents of a given directory?
How can you navigate/move to the Desktop?
How do you create a directory and a file?
pwd
ls
cd /path/to/your/desktop
Further questions:
Are paths identical across different machines & OS?
Are BASH commands identical across machines & OS?
Why should you avoid spaces in directory & file names?
mkdir my_cool_directory
Path for windows users:
How do you create a directory and a file?
touch my_cool_file.txt/.py
/Users/user_name -> /mnt/c/Users/user_name
Recap of the last session
creating/moving/handling of files & directories
Your expectations for this session
What are your expectations for this session?
https://i.imgflip.com/2w2rmv.jpg
Objectives for this session
https://media.makeameme.org/created/look-at-the-25f4b725ed.jpg
Outline for this session
https://twitter.com/OfficeMemes_/status/1298982572848869380/photo/1
Standardization/Virtualization
The course I - Saving science?
https://giphy.com/gifs/twitter-3bznFj6OB5381BEjDu
https://reproducibilitea.org/
scale
researcher
project
single researcher/
lab
consortia
student
PI
collaboration
workflow transfer
dataset transfer
….
reproducibility
reproducibility
….
dataset transfer
workflow transfer
continuation
dimensions of (academic) research
Introduction to virtualization
https://giphy.com/gifs/colbertlateshow-stephen-colbert-surprise-late-show-l0HlO3BJ8LALPW4sE
We will dive right in!
The problem statement
Once
VIRTUALIZATION WARS
The problem statement
Imagine you want to conduct an analysis of some demographic data, including obtaining & reading data, filtering & descriptive analyses of data, inferential statistics and visualization.
A colleague has a python script that does all of these things ready to go and shares it with you.
Everything is ok….
“fancy_analyzes.py”
“python fancy_analyzes.py”
The problem statement
Waaaaaiiiit a hot Montreal minute!
The script doesn’t run? The script leads to different results? What went wrong?
Let’s gather some errors here ...
The problem statement
The problem statement
What is happening?
*adapted from Felix Schönbrodt
The problem statement
Glatard et al. (2015): Reproducibility of neuroimaging analyses across operating systems
The problem statement
Freesurfer: Inter-Build Differences
Freesurfer: Inter-OS Differences
Surface maps of mean absolute difference, standard-deviation of absolute difference, t-statistics and RFT significance values showing regions where the cortical thickness extracted with Freesurfer differs for cluster A and cluster B
Surface maps of mean absolute difference, standard-deviation of absolute difference, t-statistics and RFT significance values showing regions where the cortical thickness extracted with Freesurfer differs for build 1 and build 2
Science reproducibility
software environments:
rerun the analyzes from my publication… (looking at everyone)
and how...(looking at the PIs)
Operating system (OS)
Libraries/Binaries
Applications
Operating system (OS)
Applications
Machine 1
Machine 2
Libraries/Binaries
The problem statement
Collaboration with your colleagues and everyone else
Operating system (OS)
Libraries/Binaries
Applications
Machine 1
Operating system (OS)
Applications
Machine 2
Libraries/Binaries
X
The problem statement
Freedom to experiment
The problem statement
Are we all doomed to live in an unreproducible world, forced to painfully adapt and check every script we find?
Well… maybe, but you could also learn to utilize virtualization techniques...
The problem statement
Outline for this session
https://twitter.com/OfficeMemes_/status/1298982572848869380/photo/1
Virtualization technologies aim to
Operating system (OS)
Libraries/Binaries
Applications
computing env 1
Operating system (OS)
Applications
computing env 2
Libraries/Binaries
Operating system (OS)
Applications
Machine 3
Libraries/Binaries
Introduction to virtualization
Virtualization technologies have 3 main types:
this session
Introduction to virtualization
Outline for this session
https://twitter.com/OfficeMemes_/status/1298982572848869380/photo/1
Once
VIRTUALIZATION WARS
Virtualization using venv & conda
The research galaxy went on a dark path of non-existent-reproducibility. A small alliance of brave python based resources aim to bring back the balance and ask you to join their movement ….
Virtual environments in python
Virtualization using python
Operating system (OS)
Libraries/Binaries
Applications
computing env
conda create -n *name* *python_version* *libraries*
, where *name* is the name of your virtual environment, *python_version* the python version you want to use and
*libraries* the libraries you want to install
mkdir /Users/path/Desktop/mos_eisley
cd /Users/path/Desktop/mos_eisley
Virtualization using python - conda
conda activate r2d2
conda create -y -n r2d2 python=3.7 pandas
Virtualization using python - conda
conda info --envs
conda environments are created within the conda installation path.
Virtualization using python - conda
python fancy_analyzes.py
conda is powerful but still requires caution.
conda activate
conda deactivate
Virtualization using venv & conda
I find your lack of controlling and evaluating installation processes disturbing.
We installed pandas and not the missing requests library. You have to evaluate the libraries you need!
and gather the respective list of libraries
that are imported and thus needed to run
the script/pipeline
Virtualization using venv & conda
Operating system (OS)
Libraries/Binaries
Applications
Computing env
conda install requests, pandas, matplotlib, plotly, ptitprince, seaborn=0.11.0, pingouin, statsmodels
Virtualization using venv & conda
python fancy_analyzes.py
conda is powerful but still requires caution.
conda activate
conda deactivate
Virtualization using venv & conda
https://giphy.com/gifs/disneyplus-the-mandalorian-mando-themandalorian-AcfTF7tyikWyroP0x73
conda env export > environment.yml
Virtualization using venv & conda
Virtualization using python - venv & conda
conda very powerful is as environment (comparable to venv) and package (comparable to pip) manager it combines.
Sharing specific the python version, builds and channels it does.
Be aware of differences between conda & pip and other non-python dependencies you must be.
Outline for this session
https://twitter.com/OfficeMemes_/status/1298982572848869380/photo/1
Outro/Q&A - The return of reproducibility?
Outro/Q&A - The return of reproducibility?
Outro/Q&A - The return of reproducibility?
Outro/Q&A - Recap for this session
Outro/Q&A
reproducible/scalable/
efficient research
Outro/Q&A - Questions you could/should ask based on this session
Is virtualization required for each project no matter the scale?
When should virtualization be integrated into the workflow?
What are limitations & disadvantages of virtualization?
How should virtualized computing environments be provided?
Do I really need to use virtualization, even if I don’t share scripts?
What other factors contribute to the mentioned problems and can they also be addressed via virtualization?
https://giphy.com/gifs/season-17-the-simpsons-17x6-xT5LMB2WiOdjpB7K4o
Remember your training:
environment.yml
Outro/Q&A - homework assignment
Outro/Q&A - The return of reproducibility?
I’M VIRTUALIZATION
NOOOOOOOOOOOOO
Outro/Q&A - The return of reproducibility?
Interaction style
Shared
GUI
CLI
SW
OS
Binder
conda
container
Binder/VMs
Outro/Q&A - Readings/add-on material for this session
Project T(eaching) I(ntegrity in) E(mpircal) R(eseach)
The Turing Way project illustration by Scriberia. Original version on Zenodo. http://doi.org/10.5281/zenodo.3695300.