1 of 5

PanDA/Dask Integration

Ideas, plans and status

P. Nilsson (BNL), F. Barreiro Megino (UTA), T. Maeno (BNL)

Google Technical Interchange Meeting - July 14, 2021

2 of 5

Project Goals

2

  • Provide PanDA queue on Google resources with Single-user Dask setup

    • To allow users to submit Dask tasks to PanDA
    • Use single-user Dask setup as alternative to Dask Gateway

    • Job submission primarily with prun - Jupyter would be a complementary option
    • A dedicated PanDA/JEDI instance could optionally process Dask jobs to reduce waiting times

3 of 5

PanDA/Dask Integration Overview

  • User submits Dask script + relevant image reference with prun
  • JEDI creates corresponding PanDA job
  • Harvester instance fetches job from server
  • Harvester creates and submits bundle with head pod containing PanDA Pilot + Dask Control
  • Dask Control code starts Dask scheduler and all required Dask workers
    • Communicates with Pilot via shared directory
  • Pilot fetches input data
    • How can workers know when to start?
      • User script is not executed by Dask control until Pilot has fetched input
  • Pilot transfers output asynchronously when discovered
  • Note: Pilot does not execute payload (just like Horovod-model)
    • Additional Pilot tasks could be developed

3

4 of 5

PanDA vs K8s cluster

4

client.submit(user_script)

{

Dask Control

Filestore

5 of 5

Status

  • Created “dask-cluster” on GCP
    • Created a node pool (“pool-1”) with e2-standard-8 CPUs, ie similar configuration to the “dask-basic” cluster (cluster used for Dask user evaluation)
    • Used for initial tests [from my laptop..]
  • Bits-and-pieces of code written for various tests, reuse of existing code as far as possible
    • E.g. installation of dask scheduler + workers on the fly
      • This code might become the Dask Control code
    • Horovod-model in Harvester and PanDA Pilot to be reused/reworked/extended -> Dask-model
    • Currently manually testing code snippets that should end up in new dask_k8s_submitter in Harvester
      • Evaluate different solutions; e.g. HelmCluster vs KubeCluster, autoscaling - adaptive/staggered
  • Related: Automatic Pilot image creation in progress
    • Action triggered by GitHub PR -> image pushed to GitLab PanDA container registry
    • “Pilot” image contains both PanDA Pilot and Rucio

5

config_default

[core]

account = paul@gcp4hep.org

project = gke-dev-311213

[compute]

zone = europe-west1-b

region = europe-west1