1 of 8

Shreyas Cholia

scholia@lbl.gov

Lawrence Berkeley National Laboratory

Interactive Distributed Computing with Jupyter and Friends

2019 Pangeo Community Meeting

2 of 8

Why Interactive Distributed Deep Learning?

  • Using CNNs for Atlas LHC Image Data Classification
  • Human in the loop Hyper Parameter Optimization and Distributed Training

2

3 of 8

Technology

IPython Parallel (ipyparallel)

  • Hub and Controller communicate with a set of engines (ipython kernels)
    • E.g. with LoadBalancedView for ‘destination agnostic’ scheduling
    • Currently single controller bottleneck but only for notebook communication
    • use other MPI libraries (eg. Horovod, mpi4py) for bulk communication alongside

QGrid

  • Quantopian widget for interactive tables

BQPlot

  • Interactive Visualization of Results

Kale

  • Our manager and worker service that wrap backend task giving with fine-grain task control and node resource monitoring

3

4 of 8

NERSC Jupyter architecture

  • Allocate nodes on Cori interactive queue and start ipyparallel or Dask cluster
    • Developed %ipcluster magic to setup within notebook
  • Compute nodes traditionally do not have external address
    • Required network configuration / policy decisions
  • Distributed training communication is via MPI Horovod or Cray ML Plugin

4

Cori Compute Nodes

Cori Login Node

Notebook Server Process

ipyparallel

or Dask

Controller

JupyterHub Web Server

Engine/

kernel

MPI

kernel/

ipyparallel client

Cori Filesystems

Engine/

kernel

Engine/

kernel

Engine/

kernel

Engine/

kernel

Engine/

kernel

Engine/

kernel

5 of 8

5

Plots update live

Table shows different configurations:

  • Status
  • Current loss and accuracy
  • Sort

Can add further quantities to plot and interaction buttons

https://github.com/sparticlesteve/cori-intml-examples/

6 of 8

6

7 of 8

8 of 8

Other Use Cases: NCEM

  • Interactive exploration and analysis of electron microscope images via Jupyter
  • Serial processing of 4D image arrays in numpy - looking to bump this up via Dask/IPP
  • Achieved 20x speedup on NCEM Py4DSTEM Notebook running on HPC resources at NERSC

https://github.com/py4dstem/py4DSTEM/blob