Surveying the landscape of effector gene prediction
Laura Harris, EMBL-EBI
Maria Costanzo, Broad Institute
Knowledge
Portal
Network
GWAS Catalog
Effector gene prediction is a key output of GWAS
Current resource landscape
Goal: a FAIR PEG data standard that works for all 3 resources & the community, enabling:
Predicting effector genes for complex diseases and traits
Gene prioritizations and predicted effector gene (PEG) lists
First list of predicted effector genes (PEGs) in the T2D Knowledge Portal
Heuristic to combine evidence types into a categorization of evidence strength
Investigating trends in gene prioritization
5,140 papers loaded by GWAS Catalog from 2012-2022
169 papers with systematic gene prioritization, across 157 traits
Scan titles and abstracts
Mention of gene prioritization
Scan full text
Investigating trends in gene prioritization
Number of papers incorporated into the GWAS Catalog that include systematic gene prioritization (blue bars, left vertical axis) and percent of total papers added to the GWAS Catalog (red bars, right vertical axis), by year of publication.
Variant-centric evidence
Are any of these specific to disease-relevant tissues?
Start by identifying the causal variant, find evidence about its impact
Gene-centric evidence
Start with genes in GWAS loci, find evidence about their function
Pipelines
How many evidence types are used per study?
Are there trends in usage of specific evidence types?
Gene prioritizations vs. PEG lists
Mouse mutant phenotype evidence for genes at GWAS loci
eQTL evidence for genes at GWAS loci
DEPICT gene prioritization scores for genes at GWAS loci
Gene set enrichment analysis for genes at GWAS loci
Tissue and cell type annotation enrichment for genes at GWAS loci
Tissue-specific expression evidence for genes at GWAS loci
25% of papers included gene prioritization only
75% of papers integrated all evidence in a PEG list
Some PEG lists are presented only as images
Table in image format
Graphics presented without their underlying data
A major difference in information content: all genes per locus vs. top gene only
All genes per locus (71% of papers)
A major difference in information content: all genes per locus vs. top gene only
Top gene per locus (29% of papers)
Scoring system vs. no scoring
Scoring system (29% of papers)
Scoring system vs. no scoring
No scoring system (71% of papers)
Comparing predictions for the same trait
PEG list 1
PEG list 2
Comparing predictions for the same trait
Find shared loci between lists 1 and 2
How often does the highest-ranked gene at a locus from list 1 match the top-ranked gene from list 2?
PEG list 1: multiple genes per locus
PEG list 2: top gene only
Minimal standards for PEG list metadata
Minimal standards for PEG list content and format
PEG standard development timeline
Initial
community
workshop
Sept 2024
Submit landscape manuscript
Refinement of draft standard
Convening working group
Landscape article published in NG
1st WG meeting
Recap & feedback on draft standard
2nd WG meeting
Focus on data matrix
3rd WG meeting
Focus on metadata
Kickoff benchmarking activity
4th WG
meeting
Benchmarking
results
PEG list format
Dec 2024
Spring 2025
April 2025
May 2025
June 2025
July 2025
Sept 2025
Oct 2025
5th WG meeting
PEG list format
Summary & run-through
Ancilliary session