1 of 18

PCGL Driver Project

A

4

G

H

Mélanie Courtot, David Bujold and Ma’n Zawati

2024-04-23

2 of 18

Dream team of Canadian genomicists

Team

3 of 18

Why a pan-Canadian framework for genome data?

Unify Canada’s human genome sequencing efforts to prevent redundant work across many existing National Genomic Data Generation Projects

Set out a federated data management system that leverages international standards and respects limitations on the jurisdictional and cultural movement of human genetic data

4 of 18

Why a pan-Canadian framework for genome data?

Unify Canada’s human genome sequencing efforts to prevent redundant work across many existing National Genomic Data Generation Projects

Set out a federated data management system that leverages international standards and respects limitations on the jurisdictional and cultural movement of human genetic data

$15,000,000

5 years

5 of 18

Impact of the PCGL

For access and analysis of genomes sequenced in Canada to further research

To provide means and opportunities to facilitate clinical trials

To address equity, diversity and inclusion by capturing the Canadian genomic “variome”

To provide a key mechanism to improve effectiveness of healthcare delivery

To position Canada as a key player in international genomic research endeavours

6 of 18

Structure of the

PCGL project

7 of 18

PCGL data flows

8 of 18

PCGL initial driving use cases

Led by CGEn, contains whole genome sequences (WGS) and healthcare data for 11,000 individuals, recruited into 15 Canadian clinical studies during the COVID-19 pandemic.

HostSeq

Including AllforOne, Care4Rare and Genomics4RD Initiatives. AllforOne is expected to sequence approximately 9,000 samples as part of clinical care for rare diseases.

Rare disease community

MOHCCN

Led by the Terry Fox Research Institute to accelerate the adoption of precision medicine by uniting cancer centre efforts across Canada, has plans to generate 15,000 WGS

Silent Genomes

The Silent Genomes project is studying genetic/genomic health care barriers for Indigenous peoples of Canada with initial data from 600 First Nations participants.

9 of 18

Involvement in GA4GH

10 of 18

Partial table of WS participation

Name

Engagement

David Bujold

  • Lead of Experiments Metadata Standards
  • Co-lead of Beacon Aggregation scout
  • Discovery, Clin/Pheno (Beacon, Phenopackets, RNAget, DaMaSC)

Melanie Courtot

  • Lead of DUO group
  • Co-lead of Clin/Pheno

Karen Cranston

  • DURI, DSWS
  • Experiments Metadata Standards

Yann Joly

  • Co-lead of REWS
  • Genetic Discrimination, Clinical Data-Sharing, Metrics, Data Visiting

Champions

11 of 18

Partial table of WS participation

Name

Engagement

Guillaume Bourque

  • LSG, DURI, Discovery WS

Michael Brudno

  • REWS, DURI, Clin/Pheno WS

Jon Eubank

  • Co-lead of Beacon Aggregation scout

Daisie Huang

  • Implementation: Beacon, VRS, HTSget
  • GKS work stream

Gordon Krieger

  • Implementation: Beacon

Francis Nguyen

  • Implementation: DRS, WES, TES
  • Cloud work stream

Ma’n H. Zawati

  • Generative/Conversational AI and genomic data sharing
  • REWS

Implementation

12 of 18

PCGL Technical Components

13 of 18

Data Submission, Archival, Processing

Phenopackets as a way to submit clinical data to the PCGL

Experiments Metadata to properly characterize experiments that generate sequencing data

Variants submission and storage as VCF

Data deposition and retrieval using DRS

Aligned readsets stored as CRAM

Pipeline execution standardized using WES

14 of 18

Data Access

Implementation of Authentication / Authorization across the federation using Passports for:

  • Data submission in the Submission Portal
  • Researchers data access in the Researcher Portal, Variants database, etc.
  • Federated nodes data access in the network
  • The DACO portal
  • The Participants portal

15 of 18

Data Discovery & Download

Datasets annotated with DUO codes to characterize usage

Beacon v2

  • Running discovery queries in the PCGL federation through a Beacon Network
  • Connecting the PCGL Network to other international initiatives

Streaming of deposited data using htsget / rnaget (once access has been granted)

Clinical data exportable in Phenopackets

16 of 18

Governance, Ethics, International & Commercial Partnerships

  • Policy development (Governance Framework, privacy Consent filters, DAC and implementation of Access Office) includes:
    • aligning with relevant regulatory frameworks within and outside of Canada to promote interoperable, coherent policy building inspired by the GA4GH experience, especially its GA4GH’s Regulatory and Ethics Work Stream (REWS) toolkit.

  • International and commercial partnerships are a key aspect of PCGL, partnering with GA4GH will allow us to benefit from the organization’s vast network of collaborators.

Deliverables:

    • a Strategic plan to advance commercial and international partnerships
    • a Model Partnership Agreement
    • a Partnership Governance Framework

17 of 18

Interoperability with other GA4GH Driver Projects

Counting on GA4GH standards to increase cross-initiatives connection

  • Discoverability through the Beacon v2 API between GDI and PCGL
    • Can we build a Beacon Network that allows data from both initiatives to be used together?
  • Having a link to the EGA federation by setting up the CGA node
  • Making use of ontologies that other Driver Projects are leveraging (currently being assessed by DaMaSC)

18 of 18

Thank you!

The Pan Canadian Genome Library gratefully

acknowledges the support of CIHR