1 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

ga4gh.org

2 of 51

Agenda

  • Introductions and purpose
  • Implementer talks (~10min ea)
  • Discussion, issue generation, session summary

ga4gh.org

3 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

GA4GH WES and the Amazon Genomics CLI

W. Lee Pang, PhD

Principal Developer Advocate - HealthAI Genomics

Amazon Web Services

ga4gh.org

4 of 51

Implementation summary

  • GA4GH API(s) implemented:
    • Workflow Execution Schema (WES)
  • Target user, use case, and usage pattern
    • Who: Research scientists new to AWS that want to scale genomics workflows beyond on-prem compute and use cloud datasets
    • Use cases and patterns
      • Lift existing workflow pipelines to the cloud
      • Collaborate across workflow languages
      • Simplified workflow engine deployment
      • Familiar CLI experience

ga4gh.org

5 of 51

ga4gh.org

6 of 51

What went well

  • GA4GH WES as a common API across multiple workflow engines simplifies the Amazon Genomics CLI by pushing workflow engine specifics into modular WES adapters
  • OpenAPI generated Python server code worked effectively OOTB

ga4gh.org

7 of 51

Challenges and pain points

  • Amazon Genomics CLI is written in Go. OpenAPI generated Go client for WES had errors. We also needed to add the ability to make AWS SigV4 signed requests

  • Most engines don’t support WES OOTB so we had to make adapters. We could have made a custom API specific to the needs of Amazon Genomics CLI with the same amount of effort.

ga4gh.org

8 of 51

Challenges and pain points

RunWorkflow

  • workflow_url:
    • What URL schemes are allowed? What expectations are there for handling TRS? Shouldn’t this be discoverable so the client knows what it can submit?
    • Added convention of using workflow bundles in S3 to enable submitting multi-file workflows from a local machine
  • workflow_params, workflow_engine_parameters
    • Vague definition in the spec as what these could or should be
    • Engine parameters potentially leaks underlying implementation details. Shouldn’t WES be engine agnostic?
  • tags
    • Nit: docs for what the structure should be is found in GetRunLog

ga4gh.org

9 of 51

Challenges and pain points

GetRunLog

  • Potentially huge responses due to lots of tasks, lots of outputs, both
  • run_log
    • Vague description: “log and other info”. Interpreted as the “overall” workflow log - i.e. logging, stdout, stderr from workflow engine specific to a workflow execution.
    • exit_code specified as int32: What if an exit code is not available or not an integer?
  • task_logs
    • Large list that does not support pagination
    • What should task “names” be? We used engine provided task name + AWS Batch Job ID
    • Same exit code issue as above - e.g. when a task is still running.
  • outputs
    • How should this be structured?
    • Not all engines explicitly identify workflow outputs
    • How to handle “side effect” files (see: nextflow run -with-report -with-trace)?

ga4gh.org

10 of 51

Suggestions for improvement

  • Prescriptiveness:
    • Work backwards from what a workflow runner needs rather than a common slice of what engines currently support
    • Don’t let engine implementation specifics leak into the API
    • Eliminate vagueness. Add more clarity to request parameters and response structures

  • Scalability:
    • Paginated endpoint for workflow tasks (e.g. /runs/{id}/tasks). Suggest adding {id, state, resources} to items
    • Allow for tasks and workflows to fail unusually - e.g. empty exit_codes

ga4gh.org

11 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

GA4GH WES at DNAStack

Patrick Magee

Senior Software Developer

DNAstack

ga4gh.org

12 of 51

Implementation summary

  • GA4GH API(s) implemented
    • Workflow Execution Service (WES)
    • Beacon
    • Data Connect
    • Data Repository Service
    • Passport
  • Target user, use case, and usage pattern
    • Predominately researchers but also some clinical research and private companies looking to standardize their bioinformatics processing

ga4gh.org

13 of 51

Architecture

ga4gh.org

14 of 51

What went well

  • API footprint is relatively small and we were able to implement the fundamentals quickly
  • When you are working with a single WES API it was very easy to build tooling to support most common API interactions.
  • Easy to layer Custom Authentication and Auditing

ga4gh.org

15 of 51

Challenges and painpoints

  • Very poor interoperability with other GA4GH apis and Passport. We needed to invent ways to pass information on how to authenticate with DRS, TRS and other GA4GH services
  • Originally, implemented for a multi-tenant system and we tried to stay true to the then requirement of having /ga4gh/wes/v1 at the root of the path. This did not scale
    • Recommendation: Allow WES api’s to be defined at an arbitrary level of the path
  • Listing runs technically requires state to be preserved between pages. This was technically impossible
    • Recommendation: do not require this in the spec
  • Not enough information available when listing runs to build a UI or other tooling.
    • Recommendation: Add additional properties to the RunStatus object
  • No filtering supported on Runs, meaning a user would need to list all runs to drill down to what they were interested in
  • RunLog implies that the workflow_params and outputs are JSON serializable

ga4gh.org

16 of 51

Challenges and painpoints

  • Very poor interoperability with other GA4GH apis. We needed to invent ways to pass information on how to authenticate with DRS, TRS and other GA4GH services.

  • Originally, implemented for a multi-tenant system and we tried to implement the requirement of having /ga4gh/wes/v1 at the root of the path. This did not scale and we inevitably had to break the specification and root it at /api/{project}/ga4gh/wes/v1
    • Recommendation: Allow WES api’s to be defined at an arbitrary level of the path

ga4gh.org

17 of 51

Challenges and painpoints

RunListing

  • Listing runs requires state to be preserved between pages as per the spec. This was technically impossible with the resource limits we had
    • Recommendation: do not require this in the spec
  • Not enough information available when listing runs to build a UI or other tooling.
    • Recommendation: Add additional properties to the RunStatus object
  • No filtering supported on Runs, meaning a user would need to list all runs to drill down to what they were interested in

ga4gh.org

18 of 51

Challenges and painpoints

RunWorkflow

  • No two WES APIs support executing workflows in the exact same way. Idiosyncrasies between the underlying engines results in differences in structuring the: workflow_parameters, workflow_engine_params and auth
  • No way to build tooling to automate generation of inputs in a generalized way for WES since every
  • attachment names are supposed to include the file path in the form, but this is actually

GetRunLog

  • RunLog implies that the workflow_params and outputs are JSON serializable
  • There is no record of the attachments which were submitted with workflow and cannot retrieve them making losing the ability to rerun a workflow based on the RunLog

ga4gh.org

19 of 51

Suggestions for improvement

  • Focus on improving conformance and uniformity across WES implementations
  • Focus on the end user and implement features which provide QOL improvements to them. Figure out what the user needs and make sure the spec checks most of those boxes
  • Focus on automation and discoverability of configuration. There is currently no way to form a WES request without a priori knowledge of the workflow and engine itself making it a challenge to build tooling.
  • Focus on scalability and stability

ga4gh.org

20 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

GA4GH WES for Cromwell

Sara Salahi, PhD, MBA

Senior Product Manager

Broad Institute, Data Science Platform

ga4gh.org

21 of 51

Implementation summary

  • GA4GH API(s) implementation investigation
    • Workflow Execution Service (WES)
  • Target user (who)
    • Computational biologists and research scientists who want to run workflows on cloud (starting with Azure) using Cromwell App (and in the future other execution engines)
  • use case
    • Users can submit workflows written in any language to run on cloud regardless of the workflow execution service
    • Easy adoption of new workflow execution engines into the platform

ga4gh.org

22 of 51

High Level Concept

ga4gh.org

23 of 51

Where are we now?

ga4gh.org

24 of 51

High Level Challenges and Pain points

  • Adding functionality into WES as needed based on real use cases in quick iterations

  • Wider adoption and language support (example: Nextflow)
    • Recommendation: Let engines declare their own language support.

ga4gh.org

25 of 51

Suggestions for improvement

  • Remove the workflow_attachment bundle since for engines like Cromwell forwarding to GCP or Azure it makes no sense to accept input files via the API.
    • Recommendation: Assume every script, reference, and input should be fetched at runtime, not uploaded at request time.
  • Simplifying the URL structure and enabling clients to retry sending workflows without worrying about running them twice
    • Recommendation: Make clients specify IDs in requests
  • Simplifying URL structure:
    • Recommendation: Switching /cancel from POST to DELETE would allow us to use the same request URL with different verbs for all of POST, GET and DELETE.
  • More lower level information via WES to render a good monitoring/debugging UI
  • Adding labels and ability to identify priorities

ga4gh.org

26 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

GA4GH TRS and WES in Dockstore

Denis Yuen

Team Lead

Ontario Institute for Cancer Research

ga4gh.org

27 of 51

Implementation summary

  • GA4GH API(s) implemented
  • Target user, use case, and usage pattern
  • High level architecture

ga4gh.org

28 of 51

What went well

ga4gh.org

29 of 51

Implementation Summary

dockstore.org web service implements TRS

CLI and UI are clients for TRS�CLI is also a client for WES

  1. TRS
  2. WES

30 of 51

TRS

  • TRS (Tool Registry Service) is used to read workflow and/or tool data in a read-only fashion
    • Users can retrieve workflow descriptors, container image IDs, and checksums
    • Last final release is 2.0.0 and current develop branch is queued up with work that would go into 2.0.1 includes backwards compatible changes like
      • Service-info
      • Galaxy, Snakemake
      • Zip bundles

31 of 51

TRS

  • Current TRS providers (implementing the API server-side)
    • dockstore.org
    • workflowhub.eu
  • Current TRS users (retrieving workflow data)
    • dockstore.org UI and CLI
    • WfExS-backend
    • Terra (NIH - AnVIL, BD Catalyst)
    • cwltool
    • Seven Bridges (CGC, Cavatica, BD Catalyst)
    • Galaxy
    • DNAstack workbench

32 of 51

TRS

  • What went well: No glaring/burning issues so far but …
  • Possible extensions and areas of interest
    • New tool classes (API definition is open-ended) can differ
    • New workflow languages (enum, issue opened to describe available languages via service-info)
    • Adding write features
    • Optional authenticated behaviour
  • Can be discussed more in Thursday roadmapping session

33 of 51

Challenges and painpoints

ga4gh.org

34 of 51

Suggestions for improvement

ga4gh.org

35 of 51

Dockstore 1.12 is a WES Client

# Run workflow on local machine

dockstore workflow launch --entry <entry> --json input.json

# Run Workflow on WES server

dockstore workflow wes launch --entry <entry> --json input.json

# WES Service configuration in ~/.dockstore/config

[WES]

url: https://mywesserver.com/ga4gh/wes/v1

authorization: Bearer MyBearerToken

type: bearer

36 of 51

Implementation

  • Challenges and painpoints
    • Dockstore CLI written in Java
    • Using client libraries generated from WES OpenAPI specification
    • Works, but still needs implementation specific customization

37 of 51

Authentication

[WES]

url: https://mywesserver.com/ga4gh/wes/v1

authorization: Bearer MyBearerToken

type: bearer

[WES]

url: https://mywesserver.com/ga4gh/wes/v1

# AWS profile named default:

authorization: default

type: aws

  • Dockstore config file assumed a static auth value (token, basic auth, etc.)
  • Amazon AGC uses Sig v4 signing
    • Dynamic, based on request and time
  • Solution
    • Dockstore reads AWS creds and generates AWS auth header, hack this back into the generated libraries

38 of 51

Run Request Payloads Are Different

  1. AGC requires input.json be “wrapped”, DNAstack does not
    1. dockstore workflow wes launch --entry <entry> --json agcWrapper.json -a input.json
    2. dockstore workflow wes launch --entry <entry> --json input.json --inline-workflow
  2. Descriptors in request passed differently
    • AGC takes the (TRS) URL of the primary descriptor to AGC
    • DNAstack requires the descriptors be inlined

39 of 51

Suggestions for improvement

  1. Allow only one way to submit a request*, e.g.,
    1. Either URL to descriptor or descriptors inlined
    2. Format for input standardized
    3. * or implementations support both
  2. Or, API (service-info?), should clearly indicate what is supported
  3. Auth – there may not be a good solution
    • Variability in platforms, may not be realistic to have generated code libraries that can handle this?

40 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

OPEN SLOT

Speaker Name

Speaker Title

Speaker Affiliation

ga4gh.org

41 of 51

Implementation summary

  • GA4GH API(s) implemented
  • Target user, use case, and usage pattern
  • High level architecture

ga4gh.org

42 of 51

What went well

ga4gh.org

43 of 51

Challenges and painpoints

ga4gh.org

44 of 51

Suggestions for improvement

ga4gh.org

45 of 51

Discussion

ga4gh.org

46 of 51

TEMPLATE SLIDES

ga4gh.org

47 of 51

GA4GH Connect 2022

GA4GH Cloud API Implementers Workshop

Talk Title

Speaker Name

Speaker Title

Speaker Affiliation

ga4gh.org

48 of 51

Implementation summary

  • GA4GH API(s) implemented
  • Target user, use case, and usage pattern
  • High level architecture

ga4gh.org

49 of 51

What went well

ga4gh.org

50 of 51

Challenges and painpoints

ga4gh.org

51 of 51

Suggestions for improvement

ga4gh.org