1 of 39

February 12-16, 2024

WELCOME TO

DEPENDENCY MANAGEMENT 101

THE WEBINAR WILL BEGIN SHORTLY

2 of 39

February 12-16, 2024

DEPENDENCY MANAGEMENT 101

Renata Curty

Research Data Services UCSB Library

rds@library.ucsb.edu

3 of 39

  • Discuss the significance of enhancing reproducibility in computational research
  • Present the root causes of "dependency hell"
  • Outline strategies to manage project dependencies more effectively

We will…

4 of 39

Scope

5 of 39

Reproducibility

Same data and code, different researchers, rendering same results

6 of 39

Computational Research

  • Not always “plug-and-play”
  • Many dependencies and nuanced interconnections
  • Runtime dependencies for executing code and reproduce analyses

7 of 39

What is a dependency?

A directional connection between two or more elements, indicating a logical or sequential relationship among them.

If any of these elements fail, it poses a risk to both the efficiency of the process and the intended outcome.

8 of 39

Dependency Hell

Over time, libraries, packages, software and your computational environment evolve, introducing new versions and additional dependencies or even becoming outdated and unsupported.

This can lead to script failures and irreproducible research outcomes.

9 of 39

Image by: Savonen (2022).

I swear it worked on my machine…

10 of 39

Disclaimer: it may also happen to your future self

11 of 39

Another Example

Default values changing from TRUE to FALSE

12 of 39

Dependencies - Multiple Layers

  • System-level libraries and packages
  • System configurations
  • Operating system

13 of 39

Packages & Libraries

R: Packages are collections of R functions, data, and compiled code in a well-defined format, created to add specific functionality. The directories in R where the packages are stored are called libraries.

Python: A library is a collection of code that makes everyday tasks more efficient.

14 of 39

Declare your dependencies!

Always describe the computing environment and all the required packages and libraries along with their specific versions

15 of 39

Recommendations

  • Document your computing environment (i.e., operating system, software, and versions)
  • Ensure to record all packages and libraries along with their versions
  • Use dependency management systems and virtual environments to automate and streamline the steps above
  • Consider using containers to capture dependencies and offer an environment runtime

16 of 39

Consider it as an spectrum

Static

Manual

Less Robust

Local

Environment

Executable

Automated

More Robust

Independent Environment

Dependency management approaches vary and might complement one another to achieve better outcomes.

17 of 39

How?

  • Document your computing environment (i.e., operating system, software, and versions), packages and libraries

README

Usage Notes

Documentation

18 of 39

Where?

Documentation

19 of 39

Documentation

We can do better than that…

20 of 39

Documentation

A better example

21 of 39

How?

  • Include in your project folder a file that others can readily use to install required dependencies.

requirements.txt

Install.R

sessionInfo()

More Detailed Documentation

environment.yml

22 of 39

23 of 39

environment.yml

24 of 39

Sliding a bit more to the right…

isolated | portable | reproducible

Renv

Venv

You can turn it on and off as will work in multiple projects without affecting on another!

Virtual Environments

25 of 39

Renv & Venv

  • Isolation: enables adding or updating packages and libraries in one project without impacting other projects.
  • Portability: simplifies moving your projects across computers and platforms, streamlining required installations.
  • Reproducibility: pins precise package versions you rely on, ensuring consistent installations wherever you work.

bit.ly/r-venv

26 of 39

Why Renv?

There are other packages to help you manage dependencies in R such as Groundhog, RANG, MiniCRAN, Require and Checkpoint

Renv is more widely adopted, but also has the advantage of being integrated with RStudio.

27 of 39

Why Venv?

The venv is a Python built-in module (part of the standard library) which offers a more simple solution to manage dependencies and covers all the basics for virtualenv which requires a separate installation.

28 of 39

Are there any limitations of using renv and venv?

29 of 39

Binder

Bundle your project and all its dependencies and make it available via a web browser.

Containers

30 of 39

Binder-ize your project!

  • If on a repo (e.g., GitHub, Fighshare, Zenodo, Dataverse)
  • Let others and your future self avoid the “dependency hell”
  • Allow easier access and others to interact with your project with no installations needed

31 of 39

How?

Add two files to your repo (root directory):

  1. “runtime.txt” - enter the R version of Python version in use

r-4.3.2-2023-03-15

python-3.10

  • install.R (R) or requirements.txt (Python) file with packages/libraries info

See examples: https://github.com/binder-examples

32 of 39

  1. Select the appropriate repository and enter its URL or DOI
  2. Click launch
  3. Wait patiently wait for the environment to build.
  4. Share the link to your “binder-ized” repository, or copy the supplied code snippet to your repository README to a badge so others can launch your project with a single click.

33 of 39

34 of 39

Other Recommendations

  • Ensure you have updated all your packages before running your script locally
  • Add to the README file and usage notes when your analysis was last performed
  • Be patient while your Binder builds! It might take a couple of minutes

35 of 39

Binder Limitations

  • Not for heavy computing (<10’ and <10 MB)
  • The repository should be public
  • The repository should not require any personal or sensitive information (such as passwords)

36 of 39

Static

Manual

Less Robust

Local

Environment

Executable

Automated

More Robust

Independent Environment

In a Nutshell

README

& Usage notes

Virtual Environments

(e.g., Renv & Venv)

Containers

(e.g., Binder)

Supply Install files

37 of 39

Caution: there is more to it!

  • Watch out for paths (always relative!)
  • Comment your code so others know the “whys” behind it
  • Data documentation remains key! Explain all your variables, follow a file naming convention and intuitive project structure.

38 of 39

Learn More

Data Literacy Series Handouts: https://rcd.ucsb.edu/data-literacy-series

Other Resources:

Renv documentation

https://rstudio.github.io/renv/articles/renv.html

Venv documentation:

https://docs.python.org/3/library/venv.html

The Binder project:

https://jupyter.org/binder

39 of 39

Q&A

rds@library.ucsb.edu