DRS Alignment with Beacon and Search
GA4GH Connect 2021
Chairs: Max Barkley, Brian O’Connor, Miro Cupak
ga4gh.org
Welcome!
Agenda
2
ga4gh.org
Search API Introduction
Miro Cupak, Jonathan Fuerth
ga4gh.org
Search
4
ga4gh.org
Features
5
ga4gh.org
Existing ecosystem
6
GA4GH Search
GET /tables
GET /table/{tableName}/info
GET /table/{tableName}/data
POST /search (optional)
Relational database
JSON files in a bucket
VCF+TBI
Files
Google Sheets
Phenopackets
CSV/TSV
files with data dictionaries
Other vaguely rectangular files or APIs
Data Explorer
Jupyter Notebook
Command Line Interface
Beacon
GA4GH Search (federation)
Google BigQuery
Other applications
R data frame
FHIR
ga4gh.org
Summary
7
ga4gh.org
API overview
8
ga4gh.org
Specification overview: /tables
9
GET /tables
{
"tables": [
{
"name": "drs",
"data_model": {
"$ref": "https://example.com/table/drs/info"
}
},
// more tables
],
"pagination": {
"next_page_url": "https://example.com/tables/catalog/search_drs"
}
}
ga4gh.org
Specification overview: /table/{id}/info
10
GET `/table/drs/info`
{
"name": "drs",
"description": "Table / directory containing DRS links",
"data_model": {
"$id": "https://example.com/table/drs/info",
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"id": {
"type": "string",
"description": "An identifier specific for this DRS object"
},
"file": {
"$ref": "https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/swagger.json#/definitions/DrsObject"
}
// more attributes
}
}
ga4gh.org
Specification overview: /table/{id}/data
11
GET `/table/drs/data`
{
"data_model": // skipped for brevity, see previous slide
"data": [
{
"id": "file-001",
"name": "file-001.txt",
"size": 100,
"created": "2019-01-01T12:00:01Z",
"checksums": [],
"access_methods": [
{
"type": "https"
}
]
}, //more rows
]
}
ga4gh.org
Specification overview: /search
12
POST /search
{
"query": "SELECT * FROM drs"
}
{
"data": [
{
"id": "file-001",
"name": "file-001.txt",
"size": 100,
...
},
//more rows
]
"pagination": {
"next_page_url": ...
}
}
ga4gh.org
How can Search complement DRS?
13
Data Discovery
Cohort Selection
Workflows
Results
Data that includes DRS URLs somewhere
Search API
Publish non-blob outputs & run stats as Search Tables
DRS Server(s)
Resolve DRS URLs to data
Cohort:
a Search table
Contains DRS URLs and relevant Subject/Sample info
ga4gh.org
14
ga4gh.org
DRS Alignment with Search
Ian Fore
ga4gh.org
DRS Scaling - GitHub Tickets under 342
16
ga4gh.org
Topics
DRS as Physical level protocol
vs Logical
Bundling/Search - Imaging
Data for a DRS id
Research Objects
FASP & Process
17
ga4gh.org
Physical vs Logical - and bundling
18
ga4gh.org
Conclusions - Physical vs Logical
DRS has primary value as a low level physical protocol
Rather than for logical level constructs - project, experiment run etc.
Generally valid (see imaging examples)
Higher level, application, questions should use schemas and models specific to the domain being supported
That follows much existing practice amongst GA4GH participants
This does not exclude that the higher level schemas might be referenced by or even included within DRS bundles
Bulk operations and pagination DRS are needed
At the fundamental level, not to handle application/logical level concepts.
19
ga4gh.org
“The thin middle”
20
Bob Grossman - Data Commons Framework Services
Carole Goble - Research Objects
ga4gh.org
21
Based on:Carole Goble - Research Objects
DRS
SRA Model
Imaging models
Search
ga4gh.org
Metadata for a DRS ID
22
ga4gh.org
Conclusions - MetaData for a DRS id
23
ga4gh.org
Research Objects - RO
Carole Goble, Stian Soiland-Reyes
Recording
https://www.youtube.com/watch?v=pz-MLdI7GLA
Slides
https://www.dropbox.com/s/jnklvkznp546fnx/2021-02-25-ro-crate-fdo.pdf?dl=0
24
RO-Crate Presentation
ga4gh.org
25
ga4gh.org
26
ga4gh.org
27
ga4gh.org
28
ga4gh.org
Research Objects Relevance to DRS and Search
RO have addressed areas identified as needs in DRS
Manifests if needed
Typing
Schema overlap - Search and SchemaBlocks
RO is opinionated about Schema
Broader applicability than Genomics and Health
29
ga4gh.org
What is FASP?
30
ga4gh.org
Summary
31
ga4gh.org
DRS Alignment with Beacon V2
Jordi Rambla
ga4gh.org
How Beacon has addressed the link to DRS
Handover mechanism
Declaring a handover type, label, URL, notes (details and considerations)
Available at different levels
33
ga4gh.org
handover in the Beacon+ UI
ga4gh.org
A handover example in the JSon response
ga4gh.org
Handover definition
Beacon 1.1
handoverType: $ref: '#/components/schemas/HandoverType'
note: string
description: An optional text including considerations on the handover link provided.
example: "This handover link provides access to a summarized VCF. To access the VCF containing the details for each sample filling an application is required. See Beacon contact information details."
url: string
description: URL endpoint to where the handover process could progress (in RFC 3986 format).
example: "https://api.mygenomeservice.org/handover/9dcc48d7/"
HandoverType:
description: Handover type, as an Ontology_term object with CURIE syntax for the "id" value.
id: string
Use “CUSTOM” for the "id" when no ontology is available.
example: "EFO:0004157"
label: string
This would be the "preferred Label" in the case of an ontology term.
example: "BAM format"
ga4gh.org
One example leveraging Phenopacket response
37
ga4gh.org
How Beacon has addressed the link to DRS - revisited
Handover mechanism
Declaring a handover type (DRS type), label, URL, notes (details and considerations)
Available at different levels
38
ga4gh.org
Networks of DRS Services?
Jordi Rambla
ga4gh.org
Question brainstorming
Discovery goal is to allow the discovery of resources (e.g. variants, cases…)
There are other resources like files, execution services, etc.
The same way that Beacon shines in the context of a network… could the other GA4GH service type benefit from a network
...and in particular from an smart one?
ga4gh.org
41
Screen capture courtesy of Jonathan Dursi - CanDiG
ga4gh.org
ELIXIR Beacon Network specification
Registry1
Beacon
Aggr.1
B1
R3
BA3
B3
R2
BA2
B2
Hierarchy
Beacon
Aggr.1
B1
B2
B3
B4
Registry 1
Flat
R3
BA3
B3
R2
BA2
B2
R1
BA1
B1
P2P
ga4gh.org
ELIXIR Beacon DP and GA4GH
Beacon
Aggr.1
B1
B2
B3
B4
Registry 1
Flat
ga4gh.org
The role of the Aggregator
Beacon
Aggr.1
B1
B2
B3
B4
Registry 1
Flat
ga4gh.org
DRS+Passport Summary
Max Barkley
ga4gh.org
The Reality Today
Analysis
Results
Biomedical
Platform UI
1
2
3
4
How do we authenticate for API access?
ga4gh.org
Next Step
We want this flexibility
(and more)
This is harder to do if each arrow uses different authorization
Datasets
Analysis
Results
Workflow
API
1
2
3
4
Standard
Client
ga4gh.org
Authorization in DRS
DRS, like some other GA4GH standards, recommends using OAuth2 bearer tokens (since they predated the Passports spec)
It does not prescribe:
It’s good that we started off simple! But now there are use cases that require answering these questions in an automated way
ga4gh.org
DRS-Passport Use Case
DRS server objects from multiple datasets
Datasets have different authorities granting access
Need a passport with particular visas signed by particular authorities
Possibly also from particular broker
Passport Broker 1
DRS
Dataset 1
Dataset 2
Authority 1
Authority 2
Passport Broker 2
Authority 3
Where do I get a Passport?
ga4gh.org
DRS-Passport: What we need
We need to expand the API so a client knows where to go for passports
Passport Broker 1
DRS
Dataset 1
Dataset 2
Authority 1
Authority 2
Passport Broker 2
Authority 3
Go to broker 1 and ask for...
ga4gh.org
Considerations
Some considerations need to be considered before crafting a solution:
ga4gh.org
Issues of Scale: Selections
Requesting DRS objects individually doesn’t scale to workloads on large datasets
We need to coalesce these requests into a single “selection”, that can encompass many objects
Passport Broker 1
DRS
Dataset 1
Dataset 2
Authority 1
Authority 2
Passport Broker 2
Authority 3
...
...
ga4gh.org
Issues of Scale: Selections
Even with coalescing requests, selection lists can be large
We need to be careful where we are sending these payloads and the verbosity of description
Passport Broker 1
DRS
Dataset 1
Dataset 2
Authority 1
Authority 2
Passport Broker 2
Authority 3
/ob1
/ob2
…
/obj100000
?
ga4gh.org
Issues of Scale: Token Size
HTTP headers typically have 4K or 8K limits
Passport access tokens containing visas can exceed these limits
We need a proposal that can use Passport access tokens in the body of requests when they exceed header limits
ga4gh.org
Scope of Credentials
Users may end up with passports containing many visas
They may need to down-scope by either:
Passport Broker 1
DRS
Dataset 1
Dataset 2
Authority 1
Authority 2
Passport Broker 2
Authority 3
I only want to access dataset 1?
DRS
Dataset 3
Token
ga4gh.org
OAuth2 and Existing Systems
There are implementations of DRS that have accept OAuth2 bearer tokens
Organizations exposing data over HTTP will have pre-existing authorization systems
Designing solutions compatible existing standards, implementations, and libraries (where possible) makes them easier to adopt
Token
Authorization Domain
DRS
Dataset 1
Dataset 2
Auth Server
How do I handle these tokens in my existing auth system?
ga4gh.org