February 12-16, 2024
WELCOME TO
DEPENDENCY MANAGEMENT 101
THE WEBINAR WILL BEGIN SHORTLY
February 12-16, 2024
DEPENDENCY MANAGEMENT 101
Renata Curty
Research Data Services UCSB Library
rds@library.ucsb.edu
We will…
Scope
Reproducibility
Same data and code, different researchers, rendering same results
Image by: Candace Savonen (2022).
Computational Research
What is a dependency?
A directional connection between two or more elements, indicating a logical or sequential relationship among them.
If any of these elements fail, it poses a risk to both the efficiency of the process and the intended outcome.
Dependency Hell
Over time, libraries, packages, software and your computational environment evolve, introducing new versions and additional dependencies or even becoming outdated and unsupported.
This can lead to script failures and irreproducible research outcomes.
Image by: Savonen (2022).
I swear it worked on my machine…
Disclaimer: it may also happen to your future self
Another Example
Default values changing from TRUE to FALSE
Dependencies - Multiple Layers
Packages & Libraries
R: Packages are collections of R functions, data, and compiled code in a well-defined format, created to add specific functionality. The directories in R where the packages are stored are called libraries.
Python: A library is a collection of code that makes everyday tasks more efficient.
Declare your dependencies!
Always describe the computing environment and all the required packages and libraries along with their specific versions
Recommendations
Consider it as an spectrum
Static
Manual
Less Robust
Local
Environment
Executable
Automated
More Robust
Independent Environment
Dependency management approaches vary and might complement one another to achieve better outcomes.
How?
README
Usage Notes
Documentation
Where?
Documentation
Documentation
We can do better than that…
Documentation
A better example
How?
requirements.txt
Install.R
sessionInfo()
More Detailed Documentation
environment.yml
environment.yml
Sliding a bit more to the right…
isolated | portable | reproducible
Renv
Venv
You can turn it on and off as will work in multiple projects without affecting on another!
Virtual Environments
Renv & Venv
bit.ly/r-venv
Why Renv?
There are other packages to help you manage dependencies in R such as Groundhog, RANG, MiniCRAN, Require and Checkpoint
Renv is more widely adopted, but also has the advantage of being integrated with RStudio.
Why Venv?
The venv is a Python built-in module (part of the standard library) which offers a more simple solution to manage dependencies and covers all the basics for virtualenv which requires a separate installation.
Are there any limitations of using renv and venv?
Binder
Bundle your project and all its dependencies and make it available via a web browser.
Containers
Image by: Taka & Thiéry. (2018)
Binder-ize your project!
How?
Add two files to your repo (root directory):
r-4.3.2-2023-03-15
python-3.10
See examples: https://github.com/binder-examples
Other Recommendations
Binder Limitations
Static
Manual
Less Robust
Local
Environment
Executable
Automated
More Robust
Independent Environment
In a Nutshell
README
& Usage notes
Virtual Environments
(e.g., Renv & Venv)
Containers
(e.g., Binder)
Supply Install files
Caution: there is more to it!
Learn More
Data Literacy Series Handouts: https://rcd.ucsb.edu/data-literacy-series
Other Resources:
Renv documentation
https://rstudio.github.io/renv/articles/renv.html
Venv documentation:
https://docs.python.org/3/library/venv.html
The Binder project:
Q&A
rds@library.ucsb.edu