1 of 134

#GA4GHConnect21

bit.ly/GA4GHConnect21

ga4gh.org

2 of 134

GA4GH Connect 2021: Housekeeping �Justina Chung

ga4gh.org

3 of 134

Standards for Professional Conduct

Participants in GA4GH meetings and activities must follow the GA4GH Standards for Professional Conduct:

bit.ly/ga-professional-conduct

ga4gh.org

4 of 134

Conflicts of Interest

4

Share verbally or type in the chat

Do any conflicts need to be addressed before moving forward?

ga4gh.org

5 of 134

Closed Captioning

Adjust the caption size in Windows, MacOS, or Linux:

    • Click the up ^ next to Start Video / Stop Video.
    • Click Video Settings then Accessibility.
    • Move the slider to adjust the caption size.

Chrome OS / Windows / macOS

  • Click “Closed Caption” in meeting controls at bottom of screen to start viewing closed captioning

Web / Linux

  • Click “Closed Caption/Live Transcriptin meeting controls at bottom of screen
  • Select Show Subtitle to start viewing closed captioning

Android / iOS Mobile Apps

  • Tap the Settings icon.
  • Tap Meeting.
  • Toggle Closed Captioning to on.

bit.ly/GA-Zoom-CC

ga4gh.org

6 of 134

Staying Connected During the Meeting

bit.ly/GA4GHConnect21

bit.ly/join-GA-slack

ga4gh.org

7 of 134

GA4GH Connect Virtual Meeting Agenda

ga4gh.org

8 of 134

ga4gh.org

9 of 134

ga4gh.org

10 of 134

ga4gh.org

11 of 134

ga4gh.org

12 of 134

Welcome to the virtual lobby!

ga4gh.org

13 of 134

Ask Ewan (almost) Anything!!

STEP 1: Navigate to Sli.do in your web browser or download the Slido mobile app

STEP 2: Enter event Code #GA4GH

STEP 3: Enter Plenary room

STEP 4: Type in your question

Step 5: Upvote others’ questions

NOTE: You may need to minimize your Zoom window to view both the meeting & Sli.do on your screen

ga4gh.org

14 of 134

Start presenting to display the audience questions on this slide.

Ask Ewan (almost) Anything!

ga4gh.org

15 of 134

Building Momentum on Implementation

�GA4GH Starter Kit

15 min

ga4gh.org

16 of 134

THE OPPORTUNITY...

If we can enable secondary use of clinical genomic data for research, we will have a virtual cohort of >60 million samples by 2025.

ga4gh.org

17 of 134

The GA4GH Ecosystem

3000+

Subscribers

600+

Organizational

Members

90+

Countries

24 Driver

Projects

8 Work

Streams

20 Technical Standards

7 Regulatory Policies & Frameworks

40+ Implementations & Deployments

Enabling the global learning health system

ga4gh.org

18 of 134

ga4gh.org

19 of 134

ga4gh.org

20 of 134

ga4gh.org

21 of 134

Approved Technical Standards

Cloud

Large Scale Genomics

Phenopackets v1

Workflow Execution Service API v1

Variation Representation v1

Data Use Ontology v1

GA4GH Passports v1 / AAI

Clin/Pheno Data Capture

Genomic Knowledge Standards

Data Use & Researcher Identities

Tool Registry Service API v2

Beacon API v1

Service Info/Registry API v1

Discovery

htsget v1

refget v1

Read File Formats

Variation File Formats

Crypt4GH v1

RNAget API v1

Learn more: ga4gh.org/toolkit

Data Repository Service API v1

Task Execution Service API v1

ga4gh.org

22 of 134

GA4GH 2020 Connection Demos

Driving improvements in future spec iterations based on real-world lessons

bit.ly/GA4GH-Anna

ga4gh.org

23 of 134

Model 1: Federated data hosting with data release to user

Database 1

Database 2

Database 3

Database 4

Curation

Search

Access

Curation

Search

Access

Curation

Search

Access

Curation

Search

Access

Analysis

Analysis

Analysis

Analysis

User

Meta-analysis Presentation

of Results

ga4gh.org

24 of 134

Model 2: Federated analysis of independent resources

Database 1

Database 2

Database 3

Database 4

Curation

Search

Access

Meta-analysis across cohorts

Analysis

Curation

Search

Access

Analysis

Curation

Search

Access

Analysis

Curation

Search

Access

Analysis

User

ga4gh.org

25 of 134

Model 3: Federated analysis of integrated resources

Database 1

Database 2

Database 3

Database 4

Curation

Search

Control + Meta Analysis

Analysis

Curation

Search

Analysis

Curation

Search

Analysis

Curation

Search

Analysis

Control, direction

User

ga4gh.org

26 of 134

Connection Demo Implementers

BigQuery

ga4gh.org

27 of 134

GA4GH Starter Kit

Goal: To develop a suite of out-of-the-box, modular, open source community implementations to lower the barrier to genomics interoperability

Audience: Large research organizations and collaborative consortia, as well as smaller research and clinical labs

ga4gh.org

28 of 134

GA4GH Interop “Nirvana”

Cloud First, Full Stack,�FASP demos

Discussion, Work Streams,

Reference implementations

HPC compatible, Cloud compatible, modular reference implementation starter Kit

Real world problems from Driver Projects + GA4GH Community

ga4gh.org

29 of 134

Announcing GA4GH

Chief Standards Officer

We are excited to have �Dr. Susan Fairley, PhD, join GA4GH as the new Chief Standards Officer based at EMBL-EBI!

ga4gh.org

30 of 134

Thank You

Thank you for participating while:

  • Taking on new responsibilities relating to COVID-19
  • Working on the front line in clinical settings
  • Adjusting to new work environments

GA4GH has an amazing community and this is an opportunity for us to support each other.

The past year has not been normal...

ga4gh.org

31 of 134

Meeting Goals

GA4GH Work Streams, FASP and EDI Advisory Group

ga4gh.org

32 of 134

Regulatory & Ethics�Yann Joly

5 min + 2 min Q&A

ga4gh.org

33 of 134

Regulatory & Ethics

Genetic Discrimination Observatory (GDO) - March 1 @ 21:00 UTC

  • The GDO is a network of researchers and other stakeholders dedicated to researching and preventing discrimination based on genomic and other omic data
  • Exploring GDO with GA4GH - open discussion aimed at defining new work items

Data Access Committee Review Standards (DACReS) - March 1 @ 22:00 UTC

  • Joint effort with DURI, defining procedural standards for DACs
  • Joint drafting the Purpose and Transparency sections of the procedural standards policy document

General REWS Meeting - March 2 @ 12:00 UTC

  • Driver Project Updates - calling all DPs to join and share their recent efforts in the REWS space!

Return of Results Policy - March 3 @ 12:00 UTC

  • Last opportunity to provide comments/feedback on the aspirational policy before its finalisation

REWS-EDI Alignment - March 3 @ 13:00 UTC

  • Integrating Equity, Diversity, and Inclusion into the GA4GH Product Approval Process
  • Engage GA4GH contributors to brainstorm/develop a new EDI evaluation

33

ga4gh.org

34 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

35 of 134

Data Security

Jean-Pierre Hubaux

5 min + 2 min Q&A

ga4gh.org

36 of 134

Data Security

Federated Analysis and Cloud Security - March 3 @ 21:00 UTC

  • Exploring how to deal with secure federated analysis using GA4GH Cloud APIs
    • Calling interested DPs
  • Meeting Goal: Discuss why extra security considerations may be needed for Cloud APIs and how it would work at a high-level
  • The meeting will start with a presentation showing how to privacy-protect partial aggregates without adding noise
  • We hope for the participation of the Cloud Work Stream, TopMed, HDR UK, ICGC-ARGO and more generally all driver projects facing the challenge of siloed data

Data Security Work Stream Meeting - March 4 @ 13:30 UTC

  • Gauge interest on the “Malfeasance rules” Product - Get Volunteers and leads
  • Get “Breach Response” to be polished especially for ELIXIR

36

ga4gh.org

37 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

38 of 134

Large Scale Genomics

Oliver Hofmann & Thomas Keane

5 min + 2 min Q&A

ga4gh.org

39 of 134

Large Scale Genomics

VCF/VRS/Refget Alignment (March 1st, 22:00 - 23:00 UTC)

VRS/VCF/Refget/Sequence Annotation teams

Common Terms & Data Models; Translating between VRS and VCF

Large Scale Genomics Work Stream (March 2nd, 13:30 - 15:00 UTC)

LSG participants, driver projects

Project updates

Re-starting Future of VCF initiative

Community engagement and finding maintainers

39

ga4gh.org

40 of 134

Large Scale Genomics

Key Management in the Cloud (March 2nd, 21:00 - 22:30 UTC)

LSG/Cloud/DURI, Driver Projects with interest in Crypt4GH

Handling of encryption key in cloud environments

Interaction of Crypt4GH and DRS

Sequence Annotation (March 3rd, 22:30 - 00:00 UTC)

LSG/GKS, Driver Projects

Discussion of SA scope

Feature exploration, managing entity relationships

40

ga4gh.org

41 of 134

Future of VCF crossroads

Population scale collections of genetic variation

  • Community based standards (VCF) super linear scaling due to rare variants, e.g. Gnomad, Genomics England, UK Biobank, All of Us
  • Real risk of forking of formats for exchanging/accessing

Working group commenced in 2019

  • Initial momentum and engagement towards common goal
  • Delays in data access, personnel transitions, disengagement, de-prioritisation and local demands

Possible scenarios:

  • Multiple tool specific representations for compressed storage of variant data w/ single implementations (VCF for interchange only)
  • Refocus efforts on developing a major VCF increment to support population scale, multiple community implementations

Option 2 - requires serious engagement from main variant callers and cohorts required to achieve

41

ga4gh.org

42 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

43 of 134

Genomic Knowledge Standards

Andy Yates

ga4gh.org

44 of 134

Genomic Knowledge Standards

VRS 1.3 and Implementation Guidelines - March 1st @ 21:00 UTC

  • Inform on VRS 1.2 recap and discuss plans/call for contributions for VRS 1.3
  • Community introduction to the VRSATILE Approach

VRS/VCF/RefGet alignment - March 1st @ 22:00 UTC

  • Drive collaboration with other standards teams to coordinate on terminology and tools

Variation Annotation - March 1st @ 23:00 UTC

  • Define the concrete deliverables, timeline, and testing framework for upcoming v0 release
  • Complete draft models for all four VA Statement types included in the v0 spec

Phenopackets and VA/VR integration - March 2nd @ 22:30 UTC

  • Discussing a proposal for how VA and VR could work together with Phenopackets with Clin/Pheno

Sequence annotation - March 3rd @ 22:30 UTC

  • Identify and define the genomic features needed by GA4GH work streams for inclusion in the Sequence Annotation model

44

ga4gh.org

45 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

46 of 134

Discovery

Michael Baudis

ga4gh.org

47 of 134

Discovery Toolbox

A suite of general purpose standards empowering data sharing networks

For Organizations

  • Service Info: a general-purpose way to describe individual services (approved)
  • Search: a general-purpose way to query & retrieve data (up for approval)
  • Beacon v2: the standard for federated discovery of genomic variant data (v1 approved, v2 pending submission)
  • SchemaBlocks: representing data models and components (used by Search and Beacon)

For Networks

  • Service Registry: a general-purpose way to register collections of services (approved)

47

Focus on adoption and powering specific use cases

  • Data Exploration
  • Variant Discovery (e.g. Beacon Networks v2)
  • Case Discovery (e.g. Matchmaker Exchange v2)
  • Any distributed network of common services
  • etc.

ga4gh.org

48 of 134

Discovery

FASP Updates - March 1st @ 22:30 UTC

  • Continuing collaboration with the FASP team on enabling federated discovery

Discovery Work Stream - March 2 @ 12:00 UTC

  • Bring alignment across Discovery Work Stream projects

Beacon v2 - March 2 @ 13:30 UTC

  • Discussing the latest Beacon v2 updates, with a focus on filters, structural variants, and cohorts.

Phenopackets and Pedigree Integration with Beacon & Search API - March 3 @ 21:00 UTC

  • Discussing how Clin/Pheno standards can integrate with Discovery standards

DRS Alignment with Beacon & Search - March 3 @ 23:30 UTC

  • Harmonize metadata models for standardized searching of DRS objects through Beacon V2 and Search

SchemaBlocks {S}[B] - March 4 @ 12:30 UTC

  • Going forward - contributions, technicalities & {S}[B]'s role for the GA4GH standards ecosystem

48

Meeting Goals: Cross-Work Stream collaborations and internal alignments

ga4gh.org

49 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

50 of 134

Data Use & Researcher Identities

Jaime Guidry Auvil & Craig Voisin

ga4gh.org

51 of 134

Data Use

“A better DUO experience for users”

February 2021, new DUO release: hierarchy reorganised into “permissions” and “modifiers” applicable on these permissions.

Reflects input from driver projects and adopters, and aligns with our roadmap goal of improving documentation and guidance.

UX work in progress - interviews conducted and report being compiled

51

ga4gh.org

52 of 134

Data Use

Improved documentation and outreach

  • Papers under review:
    • DUO standard
    • DUO empirical evaluation
  • Governance policy updated
  • ELSI review in progress with Regulatory & Ethics workstream

52

DUO implementers as of February 2021

DUO Meetings

  • Data Access Committee Review Standards (DACReS) - March 1 @ 22:00 UTC
    • Drafting the Purpose and Transparency sections of the procedural standards policy document (with REWS)
  • Genomic in Health Implementation Forum (Virtual Working Meeting) presentation March 9
    • DUO workshop planned May 6-7 (Time TBC based on attendees time zone)

ga4gh.org

53 of 134

Researcher IDs (Passport)

Connect Meetings involving Passports

  • Federated Analysis Systems Project (FASP): March 1st @ 22:30 UTC
    • Demo review and setting FASP 2021 goals related to Passport integrations
  • DRS + Passports: March 3rd @ 13:30 UTC
    • Review technical DRS/Passport integration proposal (sneak peek, still evolving)
    • Bring use cases to identify and resolve technical or policy issues
    • Resolve how combo of Cloud/DURI specs will capture these changes
  • DRS Alignment with Beacon and Search: March 3rd @ 22:30 UTC
    • Identity IAM issues during Beacon/Search use cases of DRS

See the Passport Roadmap for more detailed goals over 2021

  • Passport v1.1 coming in 2021, currently proposals are being gathered

Update: final preparation of the Passport Manuscript for before submission

53

ga4gh.org

54 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

55 of 134

Cloud

David Glazer

ga4gh.org

56 of 134

Cloud

The Cloud Work Stream is focused on creating specific standards for defining, sharing, and executing portable workflows and accessing data across clouds.

Our APIs specifications

TRS

  • Share tools and workflows for consistent, reproducible results

WES

  • Execute workflows remotely on a defined set of data

TES

  • Recently approved as an official GA4GH standard! (on Jan 20th, 2021)
  • Execute individual tasks within an overall workflow

DRS

  • Access genomic data across a variety of cloud storage platforms

56

ga4gh.org

57 of 134

Cloud

Connect Goals:

  • Priority-set API spec enhancements for 2021 based on Driver Project need
  • Discuss our high-profile issues and translate into open pull requests

Cloud WS - March 1st @ 21:00 UTC

  • Driver projects weigh in on what new features are needed for our API specifications

Key Management in the Cloud - March 2nd @ 21:00 UTC (with Large Scale Genomics WS)

  • Secure management of encryption keys to encrypt/decrypt Crypt4GH files stored in the cloud

DRS + Passports - March 3rd @ 13:30 UTC (with DURI WS)

  • Formalize the token handoff process between researcher, passport broker, and DRS service for standardized, secure access to controlled DRS datasets

DRS Alignment with Beacon and Search - March 3rd @ 22:30 UTC (with Discovery WS)

  • Drive collaboration with other standards teams to coordinate on terminology and tools
  • Harmonize metadata models for standardized searching of DRS objects through Beacon V2 and Search

57

ga4gh.org

58 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

59 of 134

Clinical & Phenotypic Data Capture

David Hansen

ga4gh.org

60 of 134

Clinical & Phenotypic Data Capture

Phenopackets and VA/VR Integration

Clin/Pheno & GKS - March 2 @ 22:30 UTC

  • Discussing a proposal for how VA and VR could work together with Phenopackets

Next Steps/Future Directions: Computable Cohort Representation

  • Beacon and Search applications - identify a set of patients with a particular phenotype, inclusion/exclusion criteria
  • Calling Driver Projects for use cases and requirements

Phenopackets and Pedigree Integration with Beacon and Search API

Clin/Pheno & Discovery - March 3 @ 21:00 UTC

  • Identifying expected use cases and query types, feedback on Phenopackets integrations
  • Preparing for Pedigree search applications

60

Connect Goals: Integrating Clin/Pheno efforts with other Work Streams

ga4gh.org

61 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

62 of 134

FASP

Max Barkley

ga4gh.org

63 of 134

FASP

A cross-workstream effort to promote interoperability between our GA4GH standards

FASP Promotes

  • Combining GA4GH standards to solve scientific use cases
  • Identify gaps and obstacles for Driver Projects and vendors
  • Establishing a feedback cycle with Work Streams

63

?

?

Data discovery, controlled access (DURI), and analysis (Cloud)

ga4gh.org

64 of 134

FASP

Connect Goals:

  • Bring in more opinions on current interop challenges initiated during January hackathon
  • Further discuss our priority issues and translate into open pull requests

FASP - March 1st @ 22:30 UTC

  • Recap of 2020 demos, goal-setting for 2021
  • Identify new scientific research cases requiring multiple GA4GH standards to tackle in 2021

DRS + Passports - March 3rd @ 13:30 UTC (Cloud and DURI)

  • Formalize the token handoff process between researcher, passport broker, and DRS service for standardized, secure access to controlled DRS datasets

DRS Alignment with Beacon and Search - March 3rd @ 22:30 UTC (Cloud and Discovery)

  • Drive collaboration with other standards teams to coordinate on terminology and tools
  • Harmonize metadata models for standardized searching of DRS objects through Beacon V2 and Search

64

ga4gh.org

65 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

66 of 134

EDI Advisory Group

Melissa Konopko

ga4gh.org

67 of 134

EDI Advisory Group

“Equity, Diversity, and Inclusion is not a choice. It is the only way that we, as a global standards-setting organization, can proceed”

-Laura Paglione

67

ga4gh.org

68 of 134

EDI Advisory Group

Develop team inclusivity and diversity to support the creation of standards that meets the needs of the global community which we represent.

EDI Workshop for Work Streams

  • Who: WS Leads and any team members who wish to enhance the diversity and inclusivity in their teams. You!
  • What:
    • Presentation of the ongoing projects
      • New member onboarding
      • Mechanisms and procedures for advancement within the org
      • Equity eval in the Standards Review Process w/ REWS: equitable by design
    • Discussion, brainstorming, & feedback: are these the right projects?
  • When: Wednesday 1 pm UTC
  • Why: This impacts your standards! Please be part of the discussion

68

ga4gh.org

69 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

70 of 134

External Initiatives: Opportunities for Collaboration

Medical Genome Initiative

Human Pangenome Reference Consortium

International Hundred-thousand Cohorts Consortium, Cohort Atlas

ga4gh.org

71 of 134

Medical Genome Initiative

Moving whole-genome sequencing for rare disease diagnosis to the clinic

www.medgenomeinitiative.org

Shashikant Kulkarni, M.S. (Medicine), PhD, FACMG

Chair, Medical Genome Initiative

Professor & Vice Chairman for Research

Department of Molecular and Human Genetics

Baylor College of Medicine, Houston, TX

ga4gh.org

72 of 134

Improved diagnostic rates in a single test

Comparison of WGS with standard of care genetic testing for clinics throughout SickKids: Diagnostic yield of WGS is 41% (73/203) compared with 19% (38/203) using standard testing

Average of 3 genetic tests per patient; microarray analysis the most utilized

Increased yield due to off-target genes but also non-coding (intronic, miRNA) and small copy number changes not detected with other standard methods

Lionel A. et al Genet Med (2017), Stavropoulos et al. NPJ Gen Med (2016)

ga4gh.org

73 of 134

Diagnostic Utility of WGS as a first-line genetic test

  • WGS may be a useful first line genetic test but Clinical Validation of WGS is challenging and there are no clear standards in place
  • Professional bodies have made progress but specific challenges not addressed

ga4gh.org

74 of 134

Medical Genome Initiative

Launched February 2019

Mission: Expand access to high-quality clinical whole-genome sequencing for the diagnosis of rare genetic germline disease, through the establishment of common laboratory and clinical best practices

Goals: Develop and publish laboratory & clinical best practices for implementing clinical WGS for the benefit of others looking to set up the test

Membership: Consortium made up of institutions which have deployed clinical genome sequencing technology for the diagnosis of those with rare germline disorders

ga4gh.org

75 of 134

Roadmap & Working Groups

ga4gh.org

76 of 134

Analytical Validation Working Group

Rationale

No standards or consensus as to what constitutes a clinical WGS test nor what performance metric thresholds must be met

Goal

Define analytic metrics and thresholds for WGS that show no loss in performance compared to microarray and whole-exome sequencing

Status

Published

Currently inactive

Plans to reinstate and expand group to tackle more topics in depth (e.g., repeat expansions)

Christian Marshall

ga4gh.org

77 of 134

Clinical Utility Working Group

Rationale

Generating and evaluating evidence of clinical WGS is complex (i.e. effectiveness of WGS is not easily tied to a predefined health outcome)

Goal

Develop a measurement toolkit to offer resources and practical guidance using objective and validated measures

Status

Published

Currently inactive

Robin Hayeems

ga4gh.org

78 of 134

Patient Selection/Indications Working Group

Rationale

Selecting patients for whom clinical WGS would offer the most benefit can be challenging for healthcare providers

Goal

Develop evidence-based and consensus-driven best practice recommendations for which patient groups should receive WGS as a first-tier test

Method

Clinician survey of current use

Systematic evidence review + expert

opinion

Status

ACTIVE

Estimated publication date: August 2021

Kristen Wigby

ga4gh.org

79 of 134

Data Infrastructure & Management Working Group

Rationale

Guidance and recommendations for what infrastructure is needed to set up clinical WGS are lacking due to the rapid pace at which the field is developing

Goal

Describe current solutions and develop best practice recommendations for storage and management of the large volume of sequence and health data generated by clinical WGS

Method

Target audience = laboratories in the initial stages of setting up clinical WGS

Divide into 4 domains

Informatics

Software development and deployment

Information management technology

Data security

Status

ACTIVE, estimated publication date: August 2021

ga4gh.org

80 of 134

Test Interpretation & Reporting Working Group

Rationale

Guidance on how best to prioritize detection of variants relevant to the clinical phenotype while minimizing the return of highly uncertain or clinical irrelevant results are lacking

Goal

Develop recommendations for selecting and validating appropriate tools to detect and analyze the full range of variant types that can be captured by clinical WGS

Method

Requisition/Consent Annotations Analysis

Case & variant interpretation Reporting Reanalysis

Status

ACTIVE

Estimated publication date: June 2021

Christina Austin-Tse

Vaidehi Jobanputra

ga4gh.org

81 of 134

Future Directions

Publish manuscripts from active working groups

Reinstate inactive working groups where there is interest and bandwidth

Revise roadmap to include future topics of interest and work products

Implementation, reimbursement

Webinars, community discussion forums

Expand membership to capture global representation and perspectives

Individual contributor

Institutional membership

Engage with other initiatives and consortia to identify synergistic areas leading to potential collaboration

GA4GH

ga4gh.org

82 of 134

Opportunities for GA4GH Collaboration

Medical Genome Initiative Working Group

Relevant GA4GH Workstream(s)

Comments

Data Infrastructure and Management

Data security

Genomic knowledge standards

Large scale genomics

Data use and researcher identities

File formats

Data privacy and security policy

Variant annotation/representation

Test Interpretation and Reporting

Regulatory and Ethics

Genomic Knowledge Standards

Consent Toolkit & Policy

Return of results – Survey of stakeholder perspectives

Variant annotation/representation

ga4gh.org

83 of 134

Questions?

Consortia & Publications Project Manager: Stacie Taylor (Illumina) | Website management: Holly Snyder (Illumina)

Shashi Kulkarni

Baylor Medicine

Chairperson

Hutton Kearney

Mayo Clinic

Euan Ashley

Stanford Medicine

Heidi Rehm

Broad Institute

John Belmont

Illumina

David Bick

HudsonAlpha Institute for Biotechnology

David Dimmock

Rady Children’s Institute for Genomics

Vaidehi Jobanputra

New York Genome Center

Christian Marshall

The Hospital for Sick Children

Teri Manolio

NHGRI

Contributor

ga4gh.org

84 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

ga4gh.org

85 of 134

Human Pangenome Reference Consortium

Ira Hall, Yale School of Medicine

ga4gh.org

86 of 134

The Human Pangenome Project: progress towards the initial resource

Ira Hall, Yale University School of Medicine

3/1/21

On behalf of the Human Pangenome Reference Consortium (HPRC)

87 of 134

Goal: a pangenome reference to replace GRCh38

  • build the core resource
    • high-quality assemblies from >350 diverse humans
    • a map of variants and haplotypes
    • a reference data structure & coordinate system
  • nucleate and foster an ecosystem of tools
  • promote adoption, guide deployment
  • establish a model for long-term growth via international collaboration

Roadmap:

88 of 134

Sample selection

future:

  • recruitment at Mt. Sinai (BIOME) and WashU
  • seeking global partnerships to increase diversity
  • new Ethics Working Group (Koenig, Cook-Deegan, et. al)

Samples & Consent WG (co-chairs Eimear Kenny & Karen Miga)

initial sample selection method from Heng Li & Richard Durbin

the first 100 samples

cover genetic diversity

availability of low passage lines

availability of trios

open access

89 of 134

  • Our goal is to work with GA4GH to organize a global pangenome reference community
  • This community will benefit from the established data sharing models/principles and Ethics, Diversity and Inclusion expertise at GA4GH

90 of 134

Technology & Production WG

(co-chairs Karen Miga & Bob Fulton)

  • PacBio HiFi
  • ONT Ultra-Long
  • + HiC, BioNano, Illumina, StrandSeq

Data production

91 of 134

Year 1 data freeze

30 HiFi

(30x, 17-20kb)

30 ONT Ultra-Long

(~6x 100 kb+)

60 Parental Datasets

(30x, 150 bp PE)

30 Bionano Maps (N50>250kb, ~100X coverage)

30 Hi-C

(Omni-C, ~60X)

10 Strand-Seq

single-cell libraries

  • sequencing data for 30 HPRC samples deposited in AnVIL, AWS, SRA/NCBI, ENA/EBI
  • + 15 additional samples from other projects (WashU, UCSC, NHGRI, GIAB, HGSVC)
  • + 2 reference genomes (GRCh38 & CHM13)
  • 92 haploid genomes

data wrangling by the UCSC Team

92 of 134

Assembly bake-off

23 assemblies from 14 groups:

  • with trios, we can achieve high quality diploid contig assemblies from HiFi data alone
  • hifiasm:
    • 519 contigs; NG50=43Mb
    • phase-block NG50=18Mb
    • Q54 (Mercury)
    • het. SNP sensitivity: 99.3%
  • need to improve automated methods for scaffolding, T2T, & non-trio assembly

credit: all assembly teams; Assembly WG; Evaluation Team (Jarvis, Howe, et al.)

HG002

93 of 134

Phased diploid assembly with PacBio HiFi data + trio-based hifiasm

Assembly production & data management: Paten lab

Dockstore

The AnVIL

94 of 134

Preliminary assembly results

Assembly WG; Mobin Asri, Julian Lucas

95 of 134

Preliminary analysis of genome variation

alignment of each assembly to GRCh38 (non-repetitive regions)

single nucleotide variants

AFR

EAS

AMR

structural variants (≥50 bp)

AFR

EAS

AMR

indels (<50 bp)

AFR

EAS

AMR

num. variants

Hall lab:

Haley Abel

Wen-Wei Liao

Allison Regier

Pangenome WG (co-chairs Paten, Li, Hall)

pairwise variant calling

  • 8 tools, using assemblies & raw reads
  • each sample vs. GRCh38, CHM13, & itself
  • how to merge across tools & genomes?

96 of 134

Graph construction by whole genome multiple alignment

  • coarse-scale incremental MSA to define structure, GRCh38 added first (minigraph)
  • only collapse orthologous sequences
  • fine-scale MSA to refine structure, define point variants (cactus)
  • rGFA output
  • all-to-all wavefront alignment (edyeet/wfmash)
  • efficient graph induction (seqwish)
  • collapse of orthologous sequences, tandem duplications, and inversions
  • outputs full and consensus graph (GFA) & MSA
  • multiple first-class references in one graph

Pangenome WG

minigraph + cactus (Li & Paten Labs)

pangenome graph builder (pggb) (Garrison et al.)

MHC

consensus graph

97 of 134

~500kb from chr11:20Mb

one bubble = one variant

recent minigraph run @ ~100bp resolution:

  • 84 haploid genomes: GRCh38, CHM13, HPRC+
  • 24 threads, 40 wall clock hours
  • 80,917 bubbles; 74.6Mb of GRCh38 (2.3%) in bubbles
  • 72% bi-allelic; 12% tri-allelic; 16% tetra-allelic or more

Li et al., Genome Biology (2020)

https://github.com/lh3/minigraph

98 of 134

Pangenome representation

  • key formats: GFA, rGFA, GAF
  • growing ecosystem of tools for alignment, indexing, variant analysis: vg toolkit, gfatools, gfabase, GBWT, graphaligner, minigraph, pggb, giraffe, pangenie, danbing-tk, others
  • natural area of collaboration with GA4GH

Pangenome WG

99 of 134

Ongoing work

  • graph construction
  • graph evaluation & QC
  • proof of principle for common applications
  • Freeze1: 92 assemblies, 1-2 graphs (mid-2021)
  • Freeze2: >200 assemblies (late 2022)
  • seeking global partnerships to increase sample diversity, share data, coordinate on standards

100 of 134

Acknowledgements

101 of 134

a few illustrative examples

102 of 134

adapted from Heng Li

C4 locus: schizophrenia GWAS hit

Sekar et al. (2016)

103 of 134

C4 locus: schizophrenia GWAS hit

adapted from Heng Li

104 of 134

chr6 & the MHC (pggb)

chr6

HLA-A

HLA-B

HLA-C

HLA-DR

HLA-DQ

courtesy of Erik Garrison

105 of 134

courtesy of Heng Li

CR1 locus associated with Alzheimer’s

106 of 134

courtesy of Heng Li

CYP2D6 locus involved in drug metabolism

107 of 134

courtesy of Heng Li

RHD locus: RH blood group: observed two new alleles

108 of 134

courtesy of Heng Li

a variable number tandem repeat (VNTR)

109 of 134

Wen-Wei Liao (Hall lab)

Some repetitive & complex regions remain inaccessible:

  • variable number tandem repeats (VNTRs)
  • simple tandem repeats (STRs)
  • microsatellites, minisatellites, satellites
  • centromeres, telomeres, short arms, heterochromatin

110 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

111 of 134

International Hundred-thousand Cohorts Consortium, Cohort Atlas

Thomas Keane and Mélanie Courtot, EMBL-EBI

tk2@ebi.ac.ukmcourtot@ebi.ac.uk

ga4gh.org

112 of 134

International 100K+ Cohorts Consortium (IHCC): Premise

  • Large cohort studies have been established world-wide (some for decades)
  • Each constrained by size, ancestral origins, and geographic boundaries
  • Constraints limit analyses – e.g., subgroup, exposures, and interactions
  • Combining data from these cohorts enables addressing pressing global health questions no one can answer alone
    • Enhance value of each
    • Leverage enormous investments in them

113 of 134

IHCC: Vision

To enhance scientific understanding of the biological, environmental, and genetic basis of disease and to improve population health.

By the creation of a global network of large cohorts (with multi-dimensional data from diverse populations).

114 of 134

~60 cohorts, ~30M participants

First Summit (2018): 100 Attendees, 24 Countries

115 of 134

Challenges to Combining Cohorts

  • Complexity and limited documentation of available data
  • Lack of standardization and harmonization of questionnaires
  • Inability to move, send, receive, or utilize data/samples due to regulatory restrictions and national laws
  • Lack of standards for phenotyping and health outcomes
  • Cross-cultural and differences in risk tolerance and privacy

116 of 134

IHCC Cohort Atlas Project

  • Bring together several axes of cohort data discovery, e.g. disease status, data use, sample collection parameters, genotype, and phenotype
  • Gather a highly diverse set of over one hundred cohort data dictionaries

  • IHCC Cohort Atlas aims to:
  • Survey and collate cohort data dictionaries for all IHCC cohorts
  • Semantically harmonize the cohort metadata
  • An online cohort atlas to enable discovery across IHCC cohorts

117 of 134

Building a common framework

  • Data models to represent both access conditions and cohort data
  • Tools and processes for implementations
  • Deployment over clinical cohorts

IHCC

118 of 134

DATA

MODELS

COHORTS

TOOLS & PROCESSES

119 of 134

GA4GH Data Use Ontology

Jonathan

Lawson

  • Vocabulary describing permitted data uses and modifiers
  • “General research use”, “disease-specific research”, “not for profit only”...

120 of 134

Genomics Cohort Knowledge Ontology (GECKO)

Fiona

Brinkman

  • Commonly used attributes to describe cohort metadata
  • “Medication”, “sample type”, “genomics datatypes”...

121 of 134

Registry and mapping

TOOLS & PROCESSES

COHORTS

DATA

MODELS

122 of 134

IHCC cohort registry�

  • Human readability of cohort dictionaries
  • Version and change detection for update
  • Built on the EMBL-EBI Ontology Lookup Service platform

  • Built-in interoperability with mapping/curation tools

123 of 134

IHCC cohort mappings�

  • Stores mapping between GECKO and cohort terms
  • Accessible through APIs
  • Parameter to bridge between mappings If A ⬄ B and B ⬄ C then can infer A⬄ C

124 of 134

Automated mapping pipeline for cohort owners

IHCC cohort registry�

125 of 134

Applying these techniques to clinical cohorts...

TOOLS & PROCESSES

COHORTS

DATA

MODELS

126 of 134

Initial set of cohorts

  • Focus on diversity
    • SAPRIN (South African Population Research Infrastructure Network)
    • Korean Genome and Epidemiology Study (KoGES)
    • Vukuzazi
    • Golestan Cohort Study (GCS)
    • Genomics England

127 of 134

OVERALL FRAMEWORK

128 of 134

OVERALL IHCC FRAMEWORK

129 of 134

IHCC cohort atlas

Reference to external cohort sites

Intuitive filtering by cohort metadata & data dictionary attributes

Cohort presentation and display

Christina Yung

Philip Awadalla

130 of 134

Pipeline can be reused

Models can be extended

Morris

Swertz

131 of 134

DATA MODELS

COHORTS

TOOLS & PROCESSES

IHCC Cohort Atlas

132 of 134

Acknowledgements

James Overton

Rebecca

Jackson

Nicolas

Matentzoglu

Isuru

Liyanage

Giselle

Kerry

Melanie

Courtot

Thomas Keane

Philip

Awadalla

Dan Brake

Chris Lunt

Eric Plummer

Christina Yung Rosi Bajari

Minh Ha Kim Cullion

133 of 134

Start presenting to display the audience questions on this slide.

Audience Q&A Session

134 of 134

Time for a break! Join us in 6 hours for:

21:00 UTC

22:00 UTC

23:00 UTC

Genetic Discrimination Observatory

Data Access Committee Review Standards (DACReS)

Cloud Work Stream

Meeting

Federated Analysis

Systems Project (FASP)

Variation Representation Specification (VRS) 1.3 Planning & Implementation Guidelines

VCR/VRS/refget

Alignment

Variant Annotation

March 1 Working Sessions

ga4gh.org