Intro
Rok Roškar, Swiss Data Science Center, ETH Zürich
rok.roskar@sdsc.ethz.ch
12 September 2023
Slides: https://bit.ly/renku-hdbi-slides
2
Outline
3
2
Renku Hands-on Tutorial
3
What’s ahead?
1
What is Renku?
Renku enables “sustainable” data science
4
5
a story…
Development
Publication
Remembering how to re-run code to update a figure…
Set up the project environment… more than once
Inefficient sharing of computational resources…
1 year later, trying to share project materials…
Many practical and technical hurdles along the way from data to results…
6
Development
Publication
Option 2
Renku streamlines the behind-the-scenes work, so you can focus on the fun parts of data science!
Set up 1 containerized environment, and take it wherever you need it
Comprehensible workflows make reusing code a breeze
Take advantage of flexible & efficient compute infrastructure
Showcase your work
The Knowledge Pyramid: questions & incentives
7
Individual researcher(s)
Knowledge
Organization
Working on Data
Lab, group, institute
Community, funding body, university, national org
I can’t reuse my teammate’s work because I don’t understand it.
I spend extra time re-running my own code
Sharing my code and data with others requires extra effort.
How can we make sure work is not wasted with each new student turnover?
How can I showcase my group’s work?
How can my work make efficient use of compute resources?
What work has been done in this domain?
How is funding money being spent?
How can we connect researchers within and across domains?
Addressing the points of friction bubbles knowledge upwards
What is Renku?
8
Code
9
Datasets
10
Create (choose storage), assemble, annotate, publish
Workflows
11
Capture workflow
Record as KG
Optimized storage
Reuse on various backends
toil on HPC
renku run my-analysis.sh
Additional via plug-ins
Automatically
record pipelines
12
Environment
For the user, there is NO vendor or technology lock-in
apart from git + docker
The Renku Stack
https://renkulab.io
command-line interface (CLI)
Hands-on part!
16
17
Harness code & data versioning
Use the knowledge graph for <creativity>
Containerize your compute environment
Track & show file lineage via a workflow tool
Renku helps data scientists improve their computational skills, wherever they are on their journey.
Code & Data Versioning
Workflows
Environment Mgmt
Metadata Standards
Why use Renku?
What’s ahead?
18
Where is Renku used?
19
What’s ahead
20
Organizational “hub” view
21
We want to hear from you!
🙋 Try out Renku
❓ Run into a problem?
💡 Feature Request?
22