A Resource �For Helminth Genomics
APRIL 2, 2024
MATT BERRIMAN �SARAH DYER
Spring Meeting 2024,
University of Liverpool
Refreshments
A Resource For Helminth Genomics
Programme
workshop materials
https://mberriman.github.io/BSP2024/
WormBase is an international consortium providing the research community with information concerning the genetics, genomics and biology of Caenorhabditis elegans and related nematodes.
C. elegans
C. brenneri
C. briggsae
C. japonica
C. remanei
Brugia malayi
Onchocerca volvulus
Pristionchus pacificus
Strongyloides ratti
Trichuris muris
…increasing need for a new resource meeting the needs of parasitologists…
A Resource For Helminth Genomics
A Bit of Background…
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
,
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
,
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
parasite.wormbase.org
,
Twitter Followers
Annual Users
Page Views
A Resource For Helminth Genomics
2023
parasite.wormbase.org
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
,
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
,
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
,
2023
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9�0
0
1
2
3
4
5
6
7
8
9
0
0
1
2
3
4
5
6
7
8
9
0
Twitter Followers
Annual Users
Page Views
A Resource For Helminth Genomics
parasite.wormbase.org
Top Accessing Countries
USA
CHINA
UK
GERMANY
INDIA
69
206
Platyhelminths
Nematodes
A Resource For Helminth Genomics
19
Release
208
Species
275
Genomes
More genomes than ever before!
2024
UPDATES SINCE RELEASE 7 (2016)
Platyhelminths
Rhabditophora
Tapeworms
Flukes
Monogenea
A Resource For Helminth Genomics
Nematodes
Clade I
Clade C
Clade V
Clade IV
Clade III
2024
More genomes than ever before!
A Resource For Helminth Genomics
FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19
Genome Statistics
Contiguity describes how many gaps are in an assembly. In a perfect assembly the number of assembly scaffolds equals the number of chromosomes.
If all of the scaffolds of the assembly are lined up, longest to shortest, half the assembled bases will be in scaffolds ≥ N50. The higher, the better!
Completeness: BUSCO assesses genome completeness by looking for genes that are single-copy and highly conserved across a broad range of eukaryotes. A higher percentage of single BUSCO genes, indicates a more complete dataset.
A Resource For Helminth Genomics
More and higher quality assemblies
FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19
Short-read sequencing only
Long-read sequencing involved
Release 7
Release 19
Assembly updates
Release 7 to Release 19
86
59
145
Log(N50)
Number of Genomes
A Resource For Helminth Genomics
Improved gene models?
FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19
Annotation updates
From Release 7 to Release 19
145
Release 7
Release 19
THE OBSERVED ANNOTATION QUALITY IMPROVEMENT DOES NOT MATCH THE BOOST IN ASSEMBLY QUALITY
Long-read sequencing
Workshop, part one
Overview and aims
A Resource For Helminth Genomics
What is a gene?
A Resource For Helminth Genomics
What is a gene?
Gene page
A Resource For Helminth Genomics
What is a gene?
Gene page
Transcript page
A Resource For Helminth Genomics
What is a gene?
Gene page
Transcript page
A Resource For Helminth Genomics
What does the encoded protein do?
A Resource For Helminth Genomics
Functional Annotation
Gene ontology - a formal representation of knowledge about a gene with respect to three aspects:
A Resource For Helminth Genomics
Functional Annotation
Usually, by knowing which conserved domains exist in a protein, you can make accurate inferences about its function!
“a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites..”
InterPro
A Resource For Helminth Genomics
3D Protein Structure
Proteins are 3D molecules, and their 3D structure determines their function
Knowledge of protein's 3D structure is a huge hint for understanding how the protein works
AlphaFold
https://alphafold.ebi.ac.uk/
Artificial Intelligence (AI) system
Computational prediction of protein structures with unprecedented accuracy and speed.
A Resource For Helminth Genomics
Why would you want to find orthologues in other species?
If the function of a gene is not known, you can check the function of its orthologue in other species like C. elegans, where direct characterisation is more likely to have occurred.
Often orthologs share many functions/roles.
Comparative Genomics
Orthologues and Paralogues
Further insights into gene function often come from exploring its evolutionary relationship to other genes, both within the same genome, and across different genomes.
A Resource For Helminth Genomics
Efficiently query WormBase ParaSite using BioMart
A Resource For Helminth Genomics
Echinococcus multilocularis | CDKD1.14 | |
Echinococcus multilocularis | CLK2.7 | |
Echinococcus multilocularis | EmuJ_000000300 | |
Echinococcus multilocularis | EmuJ_000000800 | |
Echinococcus multilocularis | EmuJ_000001200 | |
Echinococcus multilocularis | EmuJ_000001400 | |
Echinococcus multilocularis | EmuJ_000002000 | |
Echinococcus multilocularis | EmuJ_000002800 | |
Echinococcus multilocularis | EmuJ_000003100 | |
Echinococcus multilocularis | EmuJ_000003600 |
GENE ONTOLOGIES
ALPHAFOLD
PROTEIN DOMAINS
ORTHOLOGUES
PARALOGUES
MULTI-GENE OUTPUT
https://mberriman.github.io/BSP2024/
A Resource For Helminth Genomics
Part2
6. Overview and Aims
7. Tools
BLAST
EXERCISE
The genome browser
VEP
EXERCISE
8. The WormBase ParaSite Expression browser
EXERCISE
9. Gene-set enrichment analysis
EXERCISE
A Resource For Helminth Genomics
BLAST
CGGAGCGCGTGGC
CGGAGCGCGTGGC
TACGGCCCGGAAT
GCGGTTAATTGCGGC
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
CGGAGCGCGTGGC
TACGGCCCGGAAT
GCGGTTAATTGCGGC
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
TACGGCCCGGAAT
BLAST compares nucleotide or protein sequences to all sequences of WormBase ParaSite using local alignments
Gene | Species | E-Score |
Gene A | S. haematobium | 4.3e-63 |
Gene B | S. haematobium | 1e-20 |
Gene C | S. bovis | 0.05 |
…. | | |
calculates an Expectation Score - number of hits expected to be seen by chance, when searching an equivalent sized database.
?
A Resource For Helminth Genomics
Genome Browsers
A Resource For Helminth Genomics
A Resource For Helminth Genomics
Visualising variants on AlphaFold models
FROM A VCF FILE WITH VARIANTS…
…TO PREDICTING THEIR EFFECT…
…TO VISUALISING THEM ON �3D PROTEIN MODELS…
Explore, use and analyse transcriptomics data
A Resource For Helminth Genomics
VISUALISE RNA-SEQ DATASETS BY CONDITION
EXPLORE DIFFERENTIAL GENE EXPRESSION EXPERIMENTS
PERFORM GENE-SET ENRICHMENT ANALYSIS
Gene-set enrichment analysis
A Resource For Helminth Genomics
A METHOD USED TO DETERMINE WHICH BIOLOGICAL PROCESSES, MOLECULAR FUNCTIONS, AND CELLULAR COMPONENTS ARE SIGNIFICANTLY ASSOCIATED WITH A SET OF GENES OR PROTEINS.
FROM A LIST OF GENES…
…TO A LIST OF ENRICHED TERMS OF GENE FUNCTION.
Show your support!
A Resource For Helminth Genomics
SUBSCRIBE TO OUR MAILING LIST
FOLLOW US ON TWITTER
HELPDESK
@WBParaSite
parasite-help@wormbase.org
John Tate�Ensembl Web Back-end Project Leader
EMBL's European Bioinformatics Institute �(EMBL-EBI)
Prof. Matthew Berriman�PI
School of Infection & Immunity,
College of Medical,Veterinary & Life Sciences,
University of Glasgow
Dr. Sarah Dyer�PI
EMBL's European Bioinformatics Institute (EMBL-EBI)
Steph Brown�Bioinformatician
School of Infection & Immunity,
College of Medical,Veterinary & Life Sciences,
University of Glasgow
Mehrnaz Charkhchi
Ensembl Web Back-end Developer
EMBL's European Bioinformatics Institute �(EMBL-EBI)
A Resource For Helminth Genomics
How do we currently annotate?
A FOCUS ON ANNOTATION
BRAKER PIPELINE
A Resource For Helminth Genomics
How do we currently annotate?
MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE
WHAT'S WRONG WITH THE 5' END OF THIS GENE?
SOLUTION
SPLIT FIRST INTRON TO MAKE A NEW 5' GENE?
APPEARS TO BE TWO QUITE DIFFERENT RNA-SEQ COVERAGE LEVELS
...INDEPENDENT START SITES
Steve Doyle
APOLLO ANNOTATION TOOL
A Resource For Helminth Genomics
How do we currently annotate?
MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE
...AND THE SPLIT IS SUPPORTED BY RNASEQ EVIDENCE
Z
W
APOLLO ANNOTATION TOOL
ARTEMIS COMPARISON TOOL
THE W LOCUS IS ALREADY ANNOTATED AS TWO GENES...
A Resource For Helminth Genomics
How do we currently annotate?
MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE
APOLLO ANNOTATION TOOL
A Resource For Helminth Genomics
How do we currently annotate?
MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE
APOLLO ANNOTATION TOOL
A Resource For Helminth Genomics
How do we currently annotate?
MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE
APOLLO ANNOTATION TOOL
A Resource For Helminth Genomics
Description of gene function
AN AREA THAT STILL NEEDS CURATION
FUNCTIONAL ANNOTATION
GENE SYNONYM IMPORTED FOR STRONGYLOIDES STERCORALIS
A Resource For Helminth Genomics
A FOCUS ON ANNOTATION
Community annotation
DO I WANT TO GET INVOLVED?
HOW MUCH TIME COULD I SPEND ON IT?
WHAT WOULD I WANT TO GET OUT OF IT?
WHAT SHOULD BE THE FOCUS?
User comments on gene pages
A Resource For Helminth Genomics
A FORM OF COMMUNITY ANNOTATION
Gene Name
Gene Structure
Gene Description