1 of 75

Defragmentation Training school

Jupyter for interactive cloud computing

Guillaume Witz, PhD

Microscopy Imaging Center, Data Science Lab

University of Bern

1

2 of 75

2

3 of 75

“Classic” software vs. notebooks

3

4 of 75

Interactive computing with

  • Simple text file
  • Rendered in-browser
  • Write and execute code
  • Display images and plots
  • Document code with Markdown
  • Executes step by step
  • Mainly used with Python
  • Run other software (e.g. PyImagej)

5 of 75

Advantages of notebooks

  • Makes iterative design of analysis workflows practical in data science in general and microscopy in particular

5

6 of 75

Data science tool in academia

and private companies

6

7 of 75

Advantages of notebooks

  • Makes iterative design of analysis workflows practical in data science in general and microscopy in particular
  • Documenting code / workflows and increasing their reproducibility

7

8 of 75

Using Jupyter for documentation

8

Notebooks can be turned into online (interactive) documentation. Here https://guiwitz.github.io/microfilm/

9 of 75

Advantages of notebooks

  • Makes iterative design of analysis workflows practical in data science in general and microscopy in particular
  • Documenting code / workflows and increasing their reproducibility
  • Exploiting cloud resources

9

10 of 75

Exploiting cloud resources

For you, the user, “where it runs” doesn’t affect the interface

10

Notebook displayed in browser

Sends computations

Sends result

Computations done by kernel on a server

The server can run locally (laptop) or remotely (cluster, Google etc,)

11 of 75

Why run Jupyter in the cloud?

11

  • Try new software independently of your computer (conflicts, incompatible hardware etc.)!
  • Have access to your project from anywhere (remote work)
  • Have access to specific hardware: GPUs, high RAM etc.
  • Compute resource close to data storage
  • Create demos that other people can run interactively
    • Code of articles
    • Course
    • Software documentation

12 of 75

For BioImage Analysis in the cloud…

… you need

1. A running instance of Jupyter accessible via the web

2. The necessary software and packages to run the analysis

3. The necessary hardware to run the analysis

4. The data on which to apply the analysis

12

13 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter session on cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

13

14 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter session on cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

14

15 of 75

Google Colab

Pros

  • Runs on Google infrastructure. Access to free GPU
  • Great for demos and courses
  • Data storage via Google Drive

Cons

  • No way to adjust resources
  • No guarantee on service (free or at all)
  • Proprietary Jupyter version: risk of breaking features
  • Frozen environment

15

16 of 75

Google Colab

ZeroCostDL4Mic is a great project to exploit Colab for microscopy image processing

16

17 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter session on cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

17

18 of 75

Run Jupyter on for-pay cloud service

“Empty” machine where to install Jupyter and other software, and then use by remote connection

18

19 of 75

Run Jupyter on for-pay cloud service

“Empty” machine where to install Jupyter and other software, and then use by remote connection

19

20 of 75

Run Jupyter on for-pay cloud service

Pros

  • Highest flexibility in terms of computing resources / storage

Cons

  • Complex to setup and maintain
  • Responsibility for safety
  • Difficult to project costs

Some packaged solutions like SageMaker on Amazon simplify some of the difficulties.

“Empty” machine where to install Jupyter and other software, and then use by remote connection

20

21 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter session on cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

21

22 of 75

Cloud computing works with containers

22

23 of 75

Cloud computing works with containers

23

24 of 75

Cloud computing works with containers

24

25 of 75

Cloud computing works with containers

25

0.16

26 of 75

Cloud computing works with containers

26

0.16

0.19

27 of 75

Cloud computing works with containers

27

0.16

28 of 75

Cloud computing works with containers

28

0.16

0.19

29 of 75

Cloud computing works with containers

29

0.16

0.19

0.18

30 of 75

Cloud computing works with containers

30

31 of 75

Cloud computing works with containers

31

32 of 75

Cloud computing works with containers

A Docker container is an isolated place where software can run without affecting the rest of the computer or cluster. It:

  • Runs its own OS
  • Can contain any software e.g.
    • Jupyter
    • Specific packages (scikit-image, PyTorch etc.)
    • Conda
  • Can access data from “outside”
  • Can communicate with the outside e.g. Jupyter Server
  • An image of it can be stored online and reused

32

33 of 75

Cloud computing works with containers

33

Interface

34 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter session on cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

34

35 of 75

For-pay Jupyter services: Paperspace

  • Setup a Jupyter instance in a few clicks in the cloud

35

36 of 75

For-pay Jupyter services: Paperspace

  • Setup a Jupyter instance in a few clicks in the cloud

36

37 of 75

For-pay Jupyter services: Paperspace

  • Setup a Jupyter instance in a few clicks in the cloud
  • Pre-made environments with common packages

37

38 of 75

For-pay Jupyter services: Paperspace

  • Setup a Jupyter instance in a few clicks in the cloud
  • Pre-made environments with common packages
  • Adjustable computing resources with free-tier, including GPU

38

39 of 75

For-pay Jupyter services: Paperspace

  • Setup a Jupyter instance in a few clicks in the cloud
  • Pre-made environments with common packages
  • Adjustable computing resources with free-tier, including GPU
  • Further customization via Docker and GitHub
  • Possibility to upload data (limited free-tier)

39

40 of 75

For-pay Jupyter services: Paperspace

Pros

  • Easy to get started
  • Free GPU and some storage

Cons

  • Unclear where data are stored
  • Not straightforward to create a custom environment via Docker
  • Sessions limited in time
  • Can become expensive!

40

41 of 75

The many flavours of “cloud notebooks”

  • Free Jupyter services on public cloud resources using open source solutions
  • For pay dedicated Jupyter services
  • Google Colab
  • Run Jupyter on for-pay cloud resources

41

42 of 75

Example 1:

42

https://github.com/guiwitz/microfilm

43 of 75

Example 1:

43

44 of 75

Example 1:

Pros

  • Very easy to run
  • Great for demos
  • Run also other software like RStudio
  • Run a Virtual Desktop for GUIs
  • Write your own Docker file for maximum freedom

Cons

  • Limited computing resources
  • Temporary sessions
  • No saving

44

45 of 75

Example 2:

45

A platform for reproducible and collaborative data analysis developed by the Swiss Data Science Center

Hosts code and data

Keep your changes using git !

46 of 75

Example 2:

  • Free platform at https://renkulab.io/
  • Login via GitHub
  • Can also be setup on local infrastructure

46

47 of 75

Example 2:

  • Create “projects” with specific software (via pip and conda)
  • Use Templates (Docker images) as basis
  • E.g. use Renku Desktop for usage with GUIs like napari

47

48 of 75

Example 2:

  • Create “projects” with specific software (via pip and conda)
  • Use Templates (Docker images) as basis
  • E.g. use Renku Desktop for usage with GUIs like napari
  • Choose a configuration and start Jupyter

48

49 of 75

Example 2:

  • Docker uses a base image and then installs packages from conda and pip
  • To adjust permanently installed software you can:

49

50 of 75

Example 2:

  • Docker uses a base image and then installs packages from conda and pip
  • To adjust permanently installed software you can:

50

Add any software using apt-get

51 of 75

Example 2:

  • Docker uses a base image and then installs packages from conda and pip
  • To adjust permanently installed software you can:

51

Edit the requirements.txt and/or environment.yml file

52 of 75

Example 2:

52

We need scikit-image in our project so we add it to the environment,yml file. We can edit directly in GitLab and commit !

53 of 75

Example 2:

53

Committing automatically triggers the creation of an updated Docker container with scikit-image installed.

54 of 75

Example 2:

  • Possibility to include data in project
  • Uses Git-LFS
  • On public instance, limited to ~Gb

54

55 of 75

Example 2:

Pros

  • Combines data and computing
  • Keeps changes via Git
  • Run also other software like RStudio
  • Run a Virtual Desktop for GUIs
  • Write your own Docker file for maximum freedom
  • Shareable and runnable without login. Try: https://renkulab.io/projects/guillaume.witz/micpy-workshop-2022

Cons

  • Updates can be slow when rebuilding
  • Limited resources on public instance

55

56 of 75

Example 2:

56

Bioinformatics tools available to run on a computing cluster via a web-based platform. Run by the Freiburg Galaxy Team.

Keep data and notebooks

Interactive Environments

57 of 75

Example 3:

57

58 of 75

Example 3:

On https://usegalaxy.eu/ you get:

  • Access to pre-installed tools, mostly for genomics
  • Space for data to perform computations on
  • Computational resources on a cluster via Docker

58

59 of 75

Example 3:

  • Computing sessions are called Histories:
  • A history contains:
    • data,
    • documents (notebooks)
    • running tools

59

60 of 75

Example 3:

Data can be uploaded to a history

  • From a computer
  • From the web (e.g. Zenodo)
  • From a known databank (not relevant for imaging yet)

60

61 of 75

Example 3:

Data can be uploaded to a history

  • From a computer
  • From the web (e.g. Zenodo)
  • From a known databank (not relevant for imaging yet)

61

62 of 75

Example 3:

  • One type of tools allows for interactive computing: In Tools select JupyTool

62

63 of 75

Example 3:

  • One type of tools allows for interactive computing: In Tools select JupyTool
  • In Active InteractiveTools, you can find all your separate sessions running

63

64 of 75

Example 3:

  • One type of tools allows for interactive computing: In Tools select JupyTool
  • In Active InteractiveTools, you can find all your separate sessions running

64

65 of 75

Example 3:

  • One type of tools allows for interactive computing: In Tools select JupyTool
  • In Active InteractiveTools, you can find all your separate sessions running

65

66 of 75

Example 3:

66

Galaxy and Jupyter are “living” in separate places. Data and notebooks can be pushed and pulled between the two worlds

67 of 75

Example 3:

67

To import data from the History, find its ID and use get(ID) in Jupyter. The data are in the imports folder e.g. /imports/18

68 of 75

Example 3:

68

Save notebooks or data in the current History using put(filename) in Jupyter.

69 of 75

Example 3:

Run notebooks:

  • In the current default environment
  • In a new clean environment by using:

69

conda create -n defrag python=3.9

conda activate defrag

pip install ipykernel

pip install galaxy-ie-helpers

ipython kernel install --user --name="defrag"

pip install aicsimageio

pip install microfilm

70 of 75

Example 3:

Run notebooks:

  • In the current default environment
  • In a new clean environment by using:

70

Now we can create a new notebook using that kernel!

conda create -n defrag python=3.9

conda activate defrag

pip install ipykernel

pip install galaxy-ie-helpers

ipython kernel install --user --name="defrag"

pip install aicsimageio

pip install microfilm

71 of 75

Example 3:

Pros

  • Easy to start
  • Large computational resources
  • Reasonable disk space

Cons

  • No simple solution to keep environments
  • Data import / export cumbersome

71

72 of 75

Thanks for your attention!

72

73 of 75

Exercises

73

74 of 75

Running Jupyter in Galaxy

74

75 of 75

Use MyBinder to turn a repo into a Jupyter session

  • Go to GitHub
  • Create an empty repository
  • Add a file by uploading the same notebook as in the previous exercise
  • Copy the repo address
  • Go to https://mybinder.org/, paste the repo address and hit the launch button
  • Wait, wait, wait…
  • Eventually, open the notebook and make sure you can run and edit it.

75