1 of 40

A Resource �For Helminth Genomics

APRIL 2, 2024

MATT BERRIMAN �SARAH DYER

Spring Meeting 2024,

University of Liverpool

2 of 40

  • Introduce WormBase ParaSite and our latest release
  • Introduce workshop topics
  • Workshop part 1

Refreshments

  • Workshop part 2
  • Discussion on annotation

A Resource For Helminth Genomics

Programme

workshop materials

https://mberriman.github.io/BSP2024/

3 of 40

WormBase is an international consortium providing the research community with information concerning the genetics, genomics and biology of Caenorhabditis elegans and related nematodes.

C. elegans

C. brenneri

C. briggsae

C. japonica

C. remanei

Brugia malayi

Onchocerca volvulus

Pristionchus pacificus

Strongyloides ratti

Trichuris muris

…increasing need for a new resource meeting the needs of parasitologists…

A Resource For Helminth Genomics

A Bit of Background…

4 of 40

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

,

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

,

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

parasite.wormbase.org

,

Twitter Followers

Annual Users

Page Views

A Resource For Helminth Genomics

2023

parasite.wormbase.org

5 of 40

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

,

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

,

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

,

2023

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9�0

0

1

2

3

4

5

6

7

8

9

0

0

1

2

3

4

5

6

7

8

9

0

Twitter Followers

Annual Users

Page Views

A Resource For Helminth Genomics

parasite.wormbase.org

Top Accessing Countries

USA

CHINA

UK

GERMANY

INDIA

6 of 40

69

206

Platyhelminths

Nematodes

A Resource For Helminth Genomics

19

Release

208

Species

275

Genomes

More genomes than ever before!

2024

UPDATES SINCE RELEASE 7 (2016)

7 of 40

Platyhelminths

Rhabditophora

Tapeworms

Flukes

Monogenea

A Resource For Helminth Genomics

Nematodes

Clade I

Clade C

Clade V

Clade IV

Clade III

2024

More genomes than ever before!

8 of 40

A Resource For Helminth Genomics

FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19

Genome Statistics

Contiguity describes how many gaps are in an assembly. In a perfect assembly the number of assembly scaffolds equals the number of chromosomes.

If all of the scaffolds of the assembly are lined up, longest to shortest, half the assembled bases will be in scaffolds ≥ N50. The higher, the better!

Completeness: BUSCO assesses genome completeness by looking for genes that are single-copy and highly conserved across a broad range of eukaryotes. A higher percentage of single BUSCO genes, indicates a more complete dataset.

  • BUSCO ASSEMBLY, assesses the assembly quality of a genome by predicting a gene-set ab initio using AUGUSTUS.
  • BUSCO ANNOTATION, assesses annotation completeness by looking for BUSCO genes within existing gene predictions.

9 of 40

A Resource For Helminth Genomics

More and higher quality assemblies

FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19

Short-read sequencing only

Long-read sequencing involved                  

Release 7

Release 19

Assembly updates

Release 7 to Release 19

86

59

145

Log(N50)

Number of Genomes

10 of 40

A Resource For Helminth Genomics

Improved gene models?

FROM WORMBASE PARASITE RELEASE 7 TO RELEASE 19

Annotation updates

From Release 7 to Release 19

145

Release 7

Release 19

THE OBSERVED ANNOTATION QUALITY IMPROVEMENT DOES NOT MATCH THE BOOST IN ASSEMBLY QUALITY

Long-read sequencing

11 of 40

Workshop, part one

Overview and aims

  • Browsing genomes
  • Browsing gene/transcript pages
  • Querying WormBase ParaSite with BioMart

A Resource For Helminth Genomics

12 of 40

What is a gene?

A Resource For Helminth Genomics

13 of 40

What is a gene?

Gene page

A Resource For Helminth Genomics

14 of 40

What is a gene?

Gene page

Transcript page

A Resource For Helminth Genomics

15 of 40

What is a gene?

Gene page

Transcript page

A Resource For Helminth Genomics

16 of 40

What does the encoded protein do?

A Resource For Helminth Genomics

17 of 40

Functional Annotation

Gene ontology - a formal representation of knowledge about a gene with respect to three aspects:

  • Molecular Function
  • Cellular Component
  • Biological Process

A Resource For Helminth Genomics

18 of 40

Functional Annotation

Usually, by knowing which conserved domains exist in a protein, you can make accurate inferences about its function!

“a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites..”

InterPro

A Resource For Helminth Genomics

19 of 40

3D Protein Structure

Proteins are 3D molecules, and their 3D structure determines their function

Knowledge of protein's 3D structure is a huge hint for understanding how the protein works

AlphaFold

https://alphafold.ebi.ac.uk/

Artificial Intelligence (AI) system

Computational prediction of protein structures with unprecedented accuracy and speed.

A Resource For Helminth Genomics

20 of 40

Why would you want to find orthologues in other species?

If the function of a gene is not known, you can check the function of its orthologue in other species like C. elegans, where direct characterisation is more likely to have occurred.

Often orthologs share many functions/roles.

Comparative Genomics

Orthologues and Paralogues

Further insights into gene function often come from exploring its evolutionary relationship to other genes, both within the same genome, and across different genomes.

A Resource For Helminth Genomics

21 of 40

Efficiently query WormBase ParaSite using BioMart

A Resource For Helminth Genomics

Echinococcus multilocularis

CDKD1.14

Echinococcus multilocularis

CLK2.7

Echinococcus multilocularis

EmuJ_000000300

Echinococcus multilocularis

EmuJ_000000800

Echinococcus multilocularis

EmuJ_000001200

Echinococcus multilocularis

EmuJ_000001400

Echinococcus multilocularis

EmuJ_000002000

Echinococcus multilocularis

EmuJ_000002800

Echinococcus multilocularis

EmuJ_000003100

Echinococcus multilocularis

EmuJ_000003600

GENE ONTOLOGIES

ALPHAFOLD

PROTEIN DOMAINS

ORTHOLOGUES

PARALOGUES

MULTI-GENE OUTPUT

22 of 40

https://mberriman.github.io/BSP2024/

A Resource For Helminth Genomics

23 of 40

Part2

6. Overview and Aims

7. Tools

BLAST

EXERCISE

The genome browser

VEP

EXERCISE

8. The WormBase ParaSite Expression browser

EXERCISE

9. Gene-set enrichment analysis

EXERCISE

A Resource For Helminth Genomics

24 of 40

BLAST

CGGAGCGCGTGGC

CGGAGCGCGTGGC

TACGGCCCGGAAT

GCGGTTAATTGCGGC

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

CGGAGCGCGTGGC

TACGGCCCGGAAT

GCGGTTAATTGCGGC

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

TACGGCCCGGAAT

BLAST compares nucleotide or protein sequences to all sequences of WormBase ParaSite using local alignments

Gene

Species

E-Score

Gene A

S. haematobium

4.3e-63

Gene B

S. haematobium

1e-20

Gene C

S. bovis

0.05

….

calculates an Expectation Score - number of hits expected to be seen by chance, when searching an equivalent sized database.

?

A Resource For Helminth Genomics

25 of 40

Genome Browsers

  • Visualise a gene model in its genomic context
  • Assess gene model correct
  • Visualise functional genomics data

A Resource For Helminth Genomics

26 of 40

A Resource For Helminth Genomics

Visualising variants on AlphaFold models

FROM A VCF FILE WITH VARIANTS…

…TO PREDICTING THEIR EFFECT…

…TO VISUALISING THEM ON �3D PROTEIN MODELS…

27 of 40

Explore, use and analyse transcriptomics data

A Resource For Helminth Genomics

VISUALISE RNA-SEQ DATASETS BY CONDITION

EXPLORE DIFFERENTIAL GENE EXPRESSION EXPERIMENTS

PERFORM GENE-SET ENRICHMENT ANALYSIS

28 of 40

Gene-set enrichment analysis

A Resource For Helminth Genomics

A METHOD USED TO DETERMINE WHICH BIOLOGICAL PROCESSES, MOLECULAR FUNCTIONS, AND CELLULAR COMPONENTS ARE SIGNIFICANTLY ASSOCIATED WITH A SET OF GENES OR PROTEINS.

FROM A LIST OF GENES…

…TO A LIST OF ENRICHED TERMS OF GENE FUNCTION.

29 of 40

Show your support!

A Resource For Helminth Genomics

SUBSCRIBE TO OUR MAILING LIST

FOLLOW US ON TWITTER

HELPDESK

@WBParaSite

parasite-help@wormbase.org

30 of 40

John Tate�Ensembl Web Back-end Project Leader

EMBL's European Bioinformatics Institute �(EMBL-EBI)

Prof. Matthew Berriman�PI

School of Infection & Immunity,

College of Medical,Veterinary & Life Sciences,

University of Glasgow

Dr. Sarah Dyer�PI

EMBL's European Bioinformatics Institute (EMBL-EBI)

Steph Brown�Bioinformatician

School of Infection & Immunity,

College of Medical,Veterinary & Life Sciences,

University of Glasgow

Mehrnaz Charkhchi

Ensembl Web Back-end Developer

EMBL's European Bioinformatics Institute �(EMBL-EBI)

31 of 40

A Resource For Helminth Genomics

How do we currently annotate?

A FOCUS ON ANNOTATION

  • Coordinates of each exon, translated vs untranslated regions, ncRNAs
  • WormBase receives gene predictions from external genome projects
  • Large-scale producers & small-scale independent researchers
  • Quality varies

BRAKER PIPELINE

32 of 40

A Resource For Helminth Genomics

How do we currently annotate?

MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE

WHAT'S WRONG WITH THE 5' END OF THIS GENE?

SOLUTION

SPLIT FIRST INTRON TO MAKE A NEW 5' GENE?

APPEARS TO BE TWO QUITE DIFFERENT RNA-SEQ COVERAGE LEVELS

...INDEPENDENT START SITES

Steve Doyle

APOLLO ANNOTATION TOOL

33 of 40

A Resource For Helminth Genomics

How do we currently annotate?

MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE

...AND THE SPLIT IS SUPPORTED BY RNASEQ EVIDENCE

Z

W

APOLLO ANNOTATION TOOL

ARTEMIS COMPARISON TOOL

THE W LOCUS IS ALREADY ANNOTATED AS TWO GENES...

34 of 40

A Resource For Helminth Genomics

How do we currently annotate?

MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE

APOLLO ANNOTATION TOOL

35 of 40

A Resource For Helminth Genomics

How do we currently annotate?

MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE

APOLLO ANNOTATION TOOL

36 of 40

A Resource For Helminth Genomics

How do we currently annotate?

MANUAL REVIEW IS ALWAYS NECESSARY BUT RARELY AVAILABLE

APOLLO ANNOTATION TOOL

37 of 40

A Resource For Helminth Genomics

Description of gene function

AN AREA THAT STILL NEEDS CURATION

FUNCTIONAL ANNOTATION

  • Importing descriptions from UniProt
    • Historically pushing annotations to UniProt
    • Uniprot and WBPS out of sync sometimes
  • Gene names from the community imported as synonyms
    • Reactively not systematically
  • Coming soon: Automated gene descriptions from WB for WB species

GENE SYNONYM IMPORTED FOR STRONGYLOIDES STERCORALIS

38 of 40

A Resource For Helminth Genomics

A FOCUS ON ANNOTATION

Community annotation

DO I WANT TO GET INVOLVED?

HOW MUCH TIME COULD I SPEND ON IT?

WHAT WOULD I WANT TO GET OUT OF IT?

WHAT SHOULD BE THE FOCUS?

39 of 40

40 of 40

User comments on gene pages

A Resource For Helminth Genomics

A FORM OF COMMUNITY ANNOTATION

Gene Name

Gene Structure

Gene Description