#GA4GHConnect21
bit.ly/GA4GHConnect21
ga4gh.org
GA4GH Connect 2021: Housekeeping �Justina Chung
ga4gh.org
Standards for Professional Conduct
Participants in GA4GH meetings and activities must follow the GA4GH Standards for Professional Conduct:
bit.ly/ga-professional-conduct
ga4gh.org
Conflicts of Interest
4
Share verbally or type in the chat
Do any conflicts need to be addressed before moving forward?
ga4gh.org
Closed Captioning
Adjust the caption size in Windows, MacOS, or Linux:
Chrome OS / Windows / macOS
| Web / Linux
| Android / iOS Mobile Apps
|
bit.ly/GA-Zoom-CC
ga4gh.org
Staying Connected During the Meeting
bit.ly/GA4GHConnect21
bit.ly/join-GA-slack
ga4gh.org
GA4GH Connect Virtual Meeting Agenda
ga4gh.org
ga4gh.org
ga4gh.org
ga4gh.org
ga4gh.org
Welcome to the virtual lobby!
ga4gh.org
Ask Ewan (almost) Anything!!
STEP 1: Navigate to Sli.do in your web browser or download the Slido mobile app
STEP 2: Enter event Code #GA4GH
STEP 3: Enter Plenary room
STEP 4: Type in your question
Step 5: Upvote others’ questions
NOTE: You may need to minimize your Zoom window to view both the meeting & Sli.do on your screen
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Ask Ewan (almost) Anything!
ga4gh.org
Building Momentum on Implementation
�GA4GH Starter Kit
15 min
ga4gh.org
THE OPPORTUNITY...
If we can enable secondary use of clinical genomic data for research, we will have a virtual cohort of >60 million samples by 2025.
ga4gh.org
The GA4GH Ecosystem
3000+
Subscribers
600+
Organizational
Members
90+
Countries
24 Driver
Projects
8 Work
Streams
20 Technical Standards
7 Regulatory Policies & Frameworks
40+ Implementations & Deployments
Enabling the global learning health system
ga4gh.org
ga4gh.org
ga4gh.org
ga4gh.org
Approved Technical Standards
Cloud
Large Scale Genomics
Phenopackets v1
Workflow Execution Service API v1
Variation Representation v1
Data Use Ontology v1
GA4GH Passports v1 / AAI
Clin/Pheno Data Capture
Genomic Knowledge Standards
Data Use & Researcher Identities
Tool Registry Service API v2
Beacon API v1
Service Info/Registry API v1
Discovery
htsget v1
refget v1
Read File Formats
Variation File Formats
Crypt4GH v1
RNAget API v1
Learn more: ga4gh.org/toolkit
Data Repository Service API v1
Task Execution Service API v1
ga4gh.org
GA4GH 2020 Connection Demos
Driving improvements in future spec iterations based on real-world lessons
bit.ly/GA4GH-Anna
ga4gh.org
Model 1: Federated data hosting with data release to user
Database 1
Database 2
Database 3
Database 4
Curation
Search
Access
Curation
Search
Access
Curation
Search
Access
Curation
Search
Access
Analysis
Analysis
Analysis
Analysis
User
Meta-analysis Presentation
of Results
ga4gh.org
Model 2: Federated analysis of independent resources
Database 1
Database 2
Database 3
Database 4
Curation
Search
Access
Meta-analysis across cohorts
Analysis
Curation
Search
Access
Analysis
Curation
Search
Access
Analysis
Curation
Search
Access
Analysis
User
ga4gh.org
Model 3: Federated analysis of integrated resources
Database 1
Database 2
Database 3
Database 4
Curation
Search
Control + Meta Analysis
Analysis
Curation
Search
Analysis
Curation
Search
Analysis
Curation
Search
Analysis
Control, direction
User
ga4gh.org
Connection Demo Implementers
BigQuery
ga4gh.org
GA4GH Starter Kit
Goal: To develop a suite of out-of-the-box, modular, open source community implementations to lower the barrier to genomics interoperability
Audience: Large research organizations and collaborative consortia, as well as smaller research and clinical labs
ga4gh.org
GA4GH Interop “Nirvana”
Cloud First, Full Stack,�FASP demos
Discussion, Work Streams,
Reference implementations
HPC compatible, Cloud compatible, modular reference implementation starter Kit
Real world problems from Driver Projects + GA4GH Community
ga4gh.org
Announcing GA4GH
Chief Standards Officer
We are excited to have �Dr. Susan Fairley, PhD, join GA4GH as the new Chief Standards Officer based at EMBL-EBI!
ga4gh.org
Thank You
Thank you for participating while:
GA4GH has an amazing community and this is an opportunity for us to support each other.
The past year has not been normal...
ga4gh.org
Meeting Goals
GA4GH Work Streams, FASP and EDI Advisory Group
ga4gh.org
Regulatory & Ethics�Yann Joly
5 min + 2 min Q&A
ga4gh.org
Regulatory & Ethics
Genetic Discrimination Observatory (GDO) - March 1 @ 21:00 UTC
Data Access Committee Review Standards (DACReS) - March 1 @ 22:00 UTC
General REWS Meeting - March 2 @ 12:00 UTC
Return of Results Policy - March 3 @ 12:00 UTC
REWS-EDI Alignment - March 3 @ 13:00 UTC
33
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Data Security
Jean-Pierre Hubaux
5 min + 2 min Q&A
ga4gh.org
Data Security
Federated Analysis and Cloud Security - March 3 @ 21:00 UTC
Data Security Work Stream Meeting - March 4 @ 13:30 UTC
36
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Large Scale Genomics
Oliver Hofmann & Thomas Keane
5 min + 2 min Q&A
ga4gh.org
Large Scale Genomics
VCF/VRS/Refget Alignment (March 1st, 22:00 - 23:00 UTC)
VRS/VCF/Refget/Sequence Annotation teams
Common Terms & Data Models; Translating between VRS and VCF
Large Scale Genomics Work Stream (March 2nd, 13:30 - 15:00 UTC)
LSG participants, driver projects
Project updates
Re-starting Future of VCF initiative
Community engagement and finding maintainers
39
ga4gh.org
Large Scale Genomics
Key Management in the Cloud (March 2nd, 21:00 - 22:30 UTC)
LSG/Cloud/DURI, Driver Projects with interest in Crypt4GH
Handling of encryption key in cloud environments
Interaction of Crypt4GH and DRS
Sequence Annotation (March 3rd, 22:30 - 00:00 UTC)
LSG/GKS, Driver Projects
Discussion of SA scope
Feature exploration, managing entity relationships
40
ga4gh.org
Future of VCF crossroads
Population scale collections of genetic variation
Working group commenced in 2019
Possible scenarios:
Option 2 - requires serious engagement from main variant callers and cohorts required to achieve
41
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Genomic Knowledge Standards
Andy Yates
ga4gh.org
Genomic Knowledge Standards
VRS 1.3 and Implementation Guidelines - March 1st @ 21:00 UTC
VRS/VCF/RefGet alignment - March 1st @ 22:00 UTC
Variation Annotation - March 1st @ 23:00 UTC
Phenopackets and VA/VR integration - March 2nd @ 22:30 UTC
Sequence annotation - March 3rd @ 22:30 UTC
44
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Discovery
Michael Baudis
ga4gh.org
Discovery Toolbox
A suite of general purpose standards empowering data sharing networks
For Organizations
For Networks
47
Focus on adoption and powering specific use cases
ga4gh.org
Discovery
FASP Updates - March 1st @ 22:30 UTC
Discovery Work Stream - March 2 @ 12:00 UTC
Beacon v2 - March 2 @ 13:30 UTC
Phenopackets and Pedigree Integration with Beacon & Search API - March 3 @ 21:00 UTC
DRS Alignment with Beacon & Search - March 3 @ 23:30 UTC
SchemaBlocks {S}[B] - March 4 @ 12:30 UTC
48
Meeting Goals: Cross-Work Stream collaborations and internal alignments
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Data Use & Researcher Identities
Jaime Guidry Auvil & Craig Voisin
ga4gh.org
Data Use
“A better DUO experience for users”
February 2021, new DUO release: hierarchy reorganised into “permissions” and “modifiers” applicable on these permissions.
Reflects input from driver projects and adopters, and aligns with our roadmap goal of improving documentation and guidance.
UX work in progress - interviews conducted and report being compiled
51
ga4gh.org
Data Use
Improved documentation and outreach
52
DUO implementers as of February 2021
DUO Meetings
ga4gh.org
Researcher IDs (Passport)
Connect Meetings involving Passports
See the Passport Roadmap for more detailed goals over 2021
Update: final preparation of the Passport Manuscript for before submission
53
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Cloud
David Glazer
ga4gh.org
Cloud
The Cloud Work Stream is focused on creating specific standards for defining, sharing, and executing portable workflows and accessing data across clouds.
Our APIs specifications
TRS
WES
TES
DRS
56
ga4gh.org
Cloud
Connect Goals:
Cloud WS - March 1st @ 21:00 UTC
Key Management in the Cloud - March 2nd @ 21:00 UTC (with Large Scale Genomics WS)
DRS + Passports - March 3rd @ 13:30 UTC (with DURI WS)
DRS Alignment with Beacon and Search - March 3rd @ 22:30 UTC (with Discovery WS)
57
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Clinical & Phenotypic Data Capture
David Hansen
ga4gh.org
Clinical & Phenotypic Data Capture
Phenopackets and VA/VR Integration
Clin/Pheno & GKS - March 2 @ 22:30 UTC
Next Steps/Future Directions: Computable Cohort Representation
Phenopackets and Pedigree Integration with Beacon and Search API
Clin/Pheno & Discovery - March 3 @ 21:00 UTC
60
Connect Goals: Integrating Clin/Pheno efforts with other Work Streams
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
FASP
Max Barkley
ga4gh.org
FASP
A cross-workstream effort to promote interoperability between our GA4GH standards
FASP Promotes
63
?
?
Data discovery, controlled access (DURI), and analysis (Cloud)
ga4gh.org
FASP
Connect Goals:
FASP - March 1st @ 22:30 UTC
DRS + Passports - March 3rd @ 13:30 UTC (Cloud and DURI)
DRS Alignment with Beacon and Search - March 3rd @ 22:30 UTC (Cloud and Discovery)
64
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
EDI Advisory Group
Melissa Konopko
ga4gh.org
EDI Advisory Group
“Equity, Diversity, and Inclusion is not a choice. It is the only way that we, as a global standards-setting organization, can proceed”
-Laura Paglione
67
ga4gh.org
EDI Advisory Group
Develop team inclusivity and diversity to support the creation of standards that meets the needs of the global community which we represent.
EDI Workshop for Work Streams
68
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
External Initiatives: Opportunities for Collaboration
Medical Genome Initiative
Human Pangenome Reference Consortium
International Hundred-thousand Cohorts Consortium, Cohort Atlas
ga4gh.org
Medical Genome Initiative
Moving whole-genome sequencing for rare disease diagnosis to the clinic
Shashikant Kulkarni, M.S. (Medicine), PhD, FACMG
Chair, Medical Genome Initiative
Professor & Vice Chairman for Research
Department of Molecular and Human Genetics
Baylor College of Medicine, Houston, TX
ga4gh.org
Improved diagnostic rates in a single test
•Comparison of WGS with standard of care genetic testing for clinics throughout SickKids: Diagnostic yield of WGS is 41% (73/203) compared with 19% (38/203) using standard testing
•Average of 3 genetic tests per patient; microarray analysis the most utilized
•Increased yield due to off-target genes but also non-coding (intronic, miRNA) and small copy number changes not detected with other standard methods
Lionel A. et al Genet Med (2017), Stavropoulos et al. NPJ Gen Med (2016)
ga4gh.org
Diagnostic Utility of WGS as a first-line genetic test
ga4gh.org
Medical Genome Initiative
Launched February 2019
•Mission: Expand access to high-quality clinical whole-genome sequencing for the diagnosis of rare genetic germline disease, through the establishment of common laboratory and clinical best practices
•Goals: Develop and publish laboratory & clinical best practices for implementing clinical WGS for the benefit of others looking to set up the test
•Membership: Consortium made up of institutions which have deployed clinical genome sequencing technology for the diagnosis of those with rare germline disorders
ga4gh.org
Roadmap & Working Groups
ga4gh.org
Analytical Validation Working Group
Rationale
•No standards or consensus as to what constitutes a clinical WGS test nor what performance metric thresholds must be met
Goal
•Define analytic metrics and thresholds for WGS that show no loss in performance compared to microarray and whole-exome sequencing
Status
•Published
•Currently inactive
•Plans to reinstate and expand group to tackle more topics in depth (e.g., repeat expansions)
Christian Marshall
ga4gh.org
Clinical Utility Working Group
Rationale
•Generating and evaluating evidence of clinical WGS is complex (i.e. effectiveness of WGS is not easily tied to a predefined health outcome)
Goal
•Develop a measurement toolkit to offer resources and practical guidance using objective and validated measures
Status
•Published
•Currently inactive
Robin Hayeems
ga4gh.org
Patient Selection/Indications Working Group
Rationale
•Selecting patients for whom clinical WGS would offer the most benefit can be challenging for healthcare providers
Goal
•Develop evidence-based and consensus-driven best practice recommendations for which patient groups should receive WGS as a first-tier test
Method
•Clinician survey of current use
•Systematic evidence review + expert
opinion
Status
•ACTIVE
•Estimated publication date: August 2021
Kristen Wigby
ga4gh.org
Data Infrastructure & Management Working Group
Rationale
•Guidance and recommendations for what infrastructure is needed to set up clinical WGS are lacking due to the rapid pace at which the field is developing
Goal
•Describe current solutions and develop best practice recommendations for storage and management of the large volume of sequence and health data generated by clinical WGS
Method
•Target audience = laboratories in the initial stages of setting up clinical WGS
•Divide into 4 domains
•Informatics
•Software development and deployment
•Information management technology
•Data security
Status
•ACTIVE, estimated publication date: August 2021
ga4gh.org
Test Interpretation & Reporting Working Group
Rationale
•Guidance on how best to prioritize detection of variants relevant to the clinical phenotype while minimizing the return of highly uncertain or clinical irrelevant results are lacking
Goal
•Develop recommendations for selecting and validating appropriate tools to detect and analyze the full range of variant types that can be captured by clinical WGS
Method
•Requisition/Consent •Annotations •Analysis
•Case & variant interpretation •Reporting •Reanalysis
Status
•ACTIVE
•Estimated publication date: June 2021
Christina Austin-Tse
Vaidehi Jobanputra
ga4gh.org
Future Directions
•Publish manuscripts from active working groups
•Reinstate inactive working groups where there is interest and bandwidth
•Revise roadmap to include future topics of interest and work products
•Implementation, reimbursement
•Webinars, community discussion forums
•Expand membership to capture global representation and perspectives
•Individual contributor
•Institutional membership
•Engage with other initiatives and consortia to identify synergistic areas leading to potential collaboration
•GA4GH
ga4gh.org
Opportunities for GA4GH Collaboration
Medical Genome Initiative Working Group | Relevant GA4GH Workstream(s) | Comments |
Data Infrastructure and Management | •Data security •Genomic knowledge standards •Large scale genomics •Data use and researcher identities | •File formats •Data privacy and security policy •Variant annotation/representation |
Test Interpretation and Reporting | •Regulatory and Ethics •Genomic Knowledge Standards | •Consent Toolkit & Policy •Return of results – Survey of stakeholder perspectives •Variant annotation/representation |
ga4gh.org
Questions?
Consortia & Publications Project Manager: Stacie Taylor (Illumina) | Website management: Holly Snyder (Illumina)
Shashi Kulkarni
Baylor Medicine
Chairperson
Hutton Kearney
Mayo Clinic
Euan Ashley
Stanford Medicine
Heidi Rehm
Broad Institute
John Belmont
Illumina
David Bick
HudsonAlpha Institute for Biotechnology
David Dimmock
Rady Children’s Institute for Genomics
Vaidehi Jobanputra
New York Genome Center
Christian Marshall
The Hospital for Sick Children
Teri Manolio
NHGRI
Contributor
ga4gh.org
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
ga4gh.org
Human Pangenome Reference Consortium
Ira Hall, Yale School of Medicine
ga4gh.org
The Human Pangenome Project: progress towards the initial resource
Ira Hall, Yale University School of Medicine
3/1/21
On behalf of the Human Pangenome Reference Consortium (HPRC)
Goal: a pangenome reference to replace GRCh38
Roadmap:
Sample selection
future:
Samples & Consent WG (co-chairs Eimear Kenny & Karen Miga)
initial sample selection method from Heng Li & Richard Durbin
the first 100 samples
cover genetic diversity
availability of low passage lines
availability of trios
open access
Technology & Production WG
(co-chairs Karen Miga & Bob Fulton)
Data production
Year 1 data freeze
30 HiFi
(30x, 17-20kb)
30 ONT Ultra-Long
(~6x 100 kb+)
60 Parental Datasets
(30x, 150 bp PE)
30 Bionano Maps (N50>250kb, ~100X coverage)
30 Hi-C
(Omni-C, ~60X)
10 Strand-Seq
single-cell libraries
https://humanpangenome.org/year-1-sequencing-data-release/
https://github.com/human-pangenomics/HPP_Year1_Data_Freeze_v1.0
data wrangling by the UCSC Team
Assembly bake-off
23 assemblies from 14 groups:
credit: all assembly teams; Assembly WG; Evaluation Team (Jarvis, Howe, et al.)
HG002
Phased diploid assembly with PacBio HiFi data + trio-based hifiasm
Assembly production & data management: Paten lab
Dockstore
The AnVIL
Preliminary assembly results
Assembly WG; Mobin Asri, Julian Lucas
Preliminary analysis of genome variation
alignment of each assembly to GRCh38 (non-repetitive regions)
single nucleotide variants
AFR
EAS
AMR
structural variants (≥50 bp)
AFR
EAS
AMR
indels (<50 bp)
AFR
EAS
AMR
num. variants
Hall lab:
Haley Abel
Wen-Wei Liao
Allison Regier
Pangenome WG (co-chairs Paten, Li, Hall)
pairwise variant calling
Graph construction by whole genome multiple alignment
Pangenome WG
minigraph + cactus (Li & Paten Labs)
pangenome graph builder (pggb) (Garrison et al.)
MHC
consensus graph
~500kb from chr11:20Mb
one bubble = one variant
recent minigraph run @ ~100bp resolution:
Pangenome representation
Pangenome WG
Ongoing work
Acknowledgements
a few illustrative examples
adapted from Heng Li
C4 locus: schizophrenia GWAS hit
Sekar et al. (2016)
C4 locus: schizophrenia GWAS hit
adapted from Heng Li
chr6 & the MHC (pggb)
chr6
HLA-A
HLA-B
HLA-C
HLA-DR
HLA-DQ
courtesy of Erik Garrison
courtesy of Heng Li
CR1 locus associated with Alzheimer’s
courtesy of Heng Li
CYP2D6 locus involved in drug metabolism
courtesy of Heng Li
RHD locus: RH blood group: observed two new alleles
courtesy of Heng Li
a variable number tandem repeat (VNTR)
Wen-Wei Liao (Hall lab)
Some repetitive & complex regions remain inaccessible:
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
International Hundred-thousand Cohorts Consortium, Cohort Atlas
Thomas Keane and Mélanie Courtot, EMBL-EBI
tk2@ebi.ac.uk • mcourtot@ebi.ac.uk
ga4gh.org
International 100K+ Cohorts Consortium (IHCC): Premise
IHCC: Vision
To enhance scientific understanding of the biological, environmental, and genetic basis of disease and to improve population health.
By the creation of a global network of large cohorts (with multi-dimensional data from diverse populations).
~60 cohorts, ~30M participants
First Summit (2018): 100 Attendees, 24 Countries
Challenges to Combining Cohorts
IHCC Cohort Atlas Project
Building a common framework
IHCC
DATA
MODELS
COHORTS
TOOLS & PROCESSES
GA4GH Data Use Ontology
Jonathan
Lawson
Genomics Cohort Knowledge Ontology (GECKO)
Fiona
Brinkman
Registry and mapping
TOOLS & PROCESSES
COHORTS
DATA
MODELS
IHCC cohort registry�
IHCC cohort mappings�
Automated mapping pipeline for cohort owners
IHCC cohort registry�
Applying these techniques to clinical cohorts...
TOOLS & PROCESSES
COHORTS
DATA
MODELS
Initial set of cohorts
OVERALL FRAMEWORK
OVERALL IHCC FRAMEWORK
IHCC cohort atlas
Reference to external cohort sites
Intuitive filtering by cohort metadata & data dictionary attributes
Cohort presentation and display
Christina Yung
Philip Awadalla
Pipeline can be reused
Models can be extended
Morris
Swertz
DATA MODELS
COHORTS
TOOLS & PROCESSES
IHCC Cohort Atlas
Acknowledgements
James Overton
Rebecca
Jackson
Nicolas
Matentzoglu
Isuru
Liyanage
Giselle
Kerry
Melanie
Courtot
Thomas Keane
Philip
Awadalla
Dan Brake
Chris Lunt
Eric Plummer
Contact us! ihcc-browser@googlegroups.com
Christina Yung Rosi Bajari
Minh Ha Kim Cullion
ⓘ Start presenting to display the audience questions on this slide.
Audience Q&A Session
Time for a break! Join us in 6 hours for:
21:00 UTC | | | |
22:00 UTC | |||
23:00 UTC |
Genetic Discrimination Observatory
Data Access Committee Review Standards (DACReS)
Cloud Work Stream
Meeting
Federated Analysis
Systems Project (FASP)
Variation Representation Specification (VRS) 1.3 Planning & Implementation Guidelines
VCR/VRS/refget
Alignment
Variant Annotation
March 1 Working Sessions
ga4gh.org