1 of 18

Total Perspective Vortex (TPV): Seamlessly managing job destinations in Galaxy when users can’t be trusted with resources

Sanjay K. Srikakulam

Albert-Ludwigs-Universität Freiburg

2 of 18

So, what exactly is Galaxy and what can it offer?

An open-source platform for scientific data analysis
A graphical web interface

for running >9,000 tools

A FAIR data platform
A powerful workflow system
Data import/export from/to various services
Supports AAI
Galaxy: An RDM
Galaxy: A VRE
A community

Notably exceeding the bio space
Material sciences, astronomy, …

18 years and thriving

3 of 18

Distributing analysis across computing resources

https://pulsar-network.readthedocs.io

Democratizing and federating data analysis

4 of 18

Galaxy ecosystem

TIaaS

BYOS

BYOC

Galaxy project offers numerous features and services.
Just like Galaxy, all these services are 100% open-source and free
Galaxy offers a dedicated learning platform called Galaxy Training Network, which contains expert-crafted training materials on 31 topics. The stats show how widely it is used.
Next is a unique service from Galaxy called Training Infrastructure as a Service. You can request a training infrastructure, and your training will be allocated with dedicated compute infrastructure; as a trainer, you can simply focus on science rather than the technical side and monitor your student’s progress through a dedicated dashboard and so on
A new feature added to Galaxy recently was Bring Your Own Storage. You can simply add the credentials in the user preferences in Galaxy and tell Galaxy to use this storage for all your jobs
Similarly, we are implementing the final steps for the Bring Your Own Compute in combination with Pulsar and maybe ARC.
Together, BYOC and BYOS can be used as a TRE in combination with a component called a deferred dataset in Galaxy. One can simply leverage 1000s of tools and workflows already available in Galaxy.

5 of 18

Infrastructure and compute

deNBI Cloud
Openstack + Terraform + Ansible + CI/CD + TIG stack
> 95K registered users

UseGalaxy.EU

Stats

106 TiB of data per month
~1.3M jobs per month
~1500 new users per month
Every 3s one job finishes

6 of 18

Total Perspective Vortex (TPV) - Motivation

7 of 18

Total Perspective Vortex

A Python framework designed to dynamically schedule millions of jobs, configurable via Galaxy using specific rules.
Utilizes a shared YAML file/database for configuration, incorporating resource allocation rules from community experts.
TPV can access extensive metadata, including user objects, provenance information, data locality, size, etc.
Distributes jobs to remote resources such as ARC and Pulsar, making intelligent, dynamic decisions for optimal job routing.
Enhances scalability and efficiency in federated resource environments, providing fine-grained control over job scheduling and simplifying administration.

8 of 18

TPV – Configuring resource requirements for a tool

tools:

bowtie2.*:

cores: 6

mem: cores * 4

gpus: 0

env: []

destinations:

slurm:

cores: 24

mem: 64

gpus: 1

arc01:

cores: 8

mem: 32

gpus: 0

TPV will route to a destination where the job will fit

9 of 18

TPV – Intelligent resource selection

arc01 would have been a better fit because of the highmem tag, but it is currently marked offline, so slurm it is.

tools:

bowtie2.*:

cores: 12

mem: cores * 4

gpus: 0

env: []

scheduling:

require: []

prefer:

- highmem

accept:

reject:

- offline

rules: []

destinations:

slurm:

cores: 16

mem: 64

gpus: 2

scheduling:

prefer:

- general

arc01:

cores: 16

mem: 64

gpus: 0

scheduling:

prefer:

- highmem

reject:

- offline

10 of 18

TPV – Routing rules and inheritance

tools:

bowtie2:

cores: 4

mem: 16

rules:

- if: input_size > 10

cores: 16

mem: 32

- if: input_size >= 20 and input_size <=50

scheduling:

require:

- highmem

- if: input_size >= 55

fail: Size (input_size) is too large

Embedded Python expressions

Conditional tags

Contextualized errors

11 of 18

TPV – Routing rules and inheritance

tools:

bowtie2:

cores: 4

mem: 16

rules:

- if: input_size > 10

cores: 16

mem: 32

- if: input_size >= 20 and input_size <=50

scheduling:

require:

- highmem

- if: input_size >= 55

fail: Size (input_size) is too large

tools:

bwa_mem.*:

inherits: bowtie2

cores: 16

12 of 18

TPV – Metascheduling

global:

default_inherits: default

tools:

default:

cores: 1

mem: cores * 3.8

gpus: 0

env: []

params: []

scheduling:

reject:

- offline

rules: []

rank: |

final_destinations = helpers.weighted_random_sampling(candidate_destinations)

final_destinations

Custom rank functions, implemented via Python code, enable advanced metascheduling by selecting the optimal destination from a list, defaulting to the most preferred option if no custom function is provided.

13 of 18

TPV – Current developments

Job + Destination data

Best destination recommendation

TPV

Job data

Additional metadata / Recommended destination info

Jobs

Destination usage stats

Destination metrics

Optimize job scheduling with data proximity
Smart resource utilization
Edge intelligence at fingertips

14 of 18

Summary

A flexible system that makes smart, dynamic decisions to optimize job routing and resource utilization.
Utilizes a shared YAML file/database for resource allocation rules, reducing admin overhead.
Collection of rules contributed by community experts, minimizing configuration repetitiveness.
Adjusts resources based on metadata, tools, and other criteria.
Capable of distributing jobs to remote resources such as ARC and Pulsar.
Enhances user experience

15 of 18

Thank you!

16 of 18

17 of 18

Object stores and Bring Your Own Storage

Object stores

User level
History level
Workflow level
Job level

Well annotated, classified, and visually explained
Bring your own storage

All data relevant to your job will be stored only on your storage
Vault to store your credentials
Secured data analysis is one of the use cases

18 of 18

Bring Your Own Compute

Deploy a Pulsar instance on your favorite cloud or your local computing infrastructure

Detailed documentation on how to, including recipes, is already available at https://github.com/usegalaxy-eu/pulsar-deployment

Connect it to Galaxy in a few simple steps
Use Galaxy to submit jobs to your compute infrastructure
Less maintenance and administration
Leverage the vast number of tools, workflows, and materials