1 of 22

Intro

Rok Roškar, Swiss Data Science Center, ETH Zürich

rok.roskar@sdsc.ethz.ch

12 September 2023

2 of 22

2

3 of 22

Outline

3

2

Renku Hands-on Tutorial

3

What’s ahead?

1

What is Renku?

4 of 22

Renku enables “sustainable” data science

  1. Done today with tomorrow in mind
  2. Individual work benefits institution and community
  3. All components form a functioning ecosystem

4

5 of 22

5

a story…

Development

Publication

Remembering how to re-run code to update a figure…

Set up the project environment… more than once

Inefficient sharing of computational resources…

1 year later, trying to share project materials…

Many practical and technical hurdles along the way from data to results…

6 of 22

6

Development

Publication

Option 2

Renku streamlines the behind-the-scenes work, so you can focus on the fun parts of data science!

Set up 1 containerized environment, and take it wherever you need it

Comprehensible workflows make reusing code a breeze

Take advantage of flexible & efficient compute infrastructure

Showcase your work

7 of 22

The Knowledge Pyramid: questions & incentives

7

Individual researcher(s)

Knowledge

Organization

Working on Data

Lab, group, institute

Community, funding body, university, national org

I can’t reuse my teammate’s work because I don’t understand it.

I spend extra time re-running my own code

Sharing my code and data with others requires extra effort.

How can we make sure work is not wasted with each new student turnover?

How can I showcase my group’s work?

How can my work make efficient use of compute resources?

What work has been done in this domain?

How is funding money being spent?

How can we connect researchers within and across domains?

Addressing the points of friction bubbles knowledge upwards

8 of 22

What is Renku?

8

9 of 22

Code

9

10 of 22

Datasets

  • Create datasets to easily reuse and share data across projects
  • Use various backends: git-LFS, S3, Azure blob, local or network storage sources
  • Combine with compute environments and analysis examples to ensure data can easily be used and reused
  • Record pipelines that yield or consume datasets for full traceability

10

Create (choose storage), assemble, annotate, publish

11 of 22

Workflows

11

Capture workflow

Record as KG

Optimized storage

Reuse on various backends

toil on HPC

renku run my-analysis.sh

Additional via plug-ins

12 of 22

Automatically

record pipelines

12

13 of 22

Environment

  • Easy access to shared compute and storage
  • Containers for reproducibility and portability, templates for consistency
  • Maintained library of images to keep things up-to-date; install apps, dashboards, desktops etc.
  • Configurable access to resources
  • Shared project data sources

14 of 22

For the user, there is NO vendor or technology lock-in

apart from git + docker

15 of 22

The Renku Stack

https://renkulab.io

command-line interface (CLI)

16 of 22

16

17 of 22

17

Harness code & data versioning

Use the knowledge graph for <creativity>

Containerize your compute environment

Track & show file lineage via a workflow tool

Renku helps data scientists improve their computational skills, wherever they are on their journey.

Code & Data Versioning

Workflows

Environment Mgmt

Metadata Standards

Why use Renku?

18 of 22

What’s ahead?

18

19 of 22

Where is Renku used?

  • Public instance at renkulab.io; several other smaller instances
  • Primary use-cases:
    • Teaching (courses, workshops)
    • Small teams working on data analysis projects
    • Showcasing of derived datasets and results
    • Improving reusability of data products
  • Used as a “connecting” piece (e.g. enabling collaborative access to data products from other platforms)

19

20 of 22

What’s ahead

  • More flexible/easier “on-boarding” of users, projects, data
  • More comprehensive overviews of where data is used
  • Better integration with data providers
  • Our goal is to make data “alive” – how can we do better? What would you imagine to be useful for your community?
  • Renku as the “middle layer”/connector of code, compute, data
  • Organizational/group views, “hub pages”

20

21 of 22

Organizational “hub” view

21

22 of 22

We want to hear from you!

🙋 Try out Renku

  • renkulab.io - Public

📃 Renku Docs

❓ Run into a problem?

  • Post on Discourse (our forum)
  • Submit a bug report

💡 Feature Request?

  • Discourse!

22