1 of 13

Tom Brown, Diego De Panis, Romane Libouban, Saim Momin, Arash Kadkhodaei, Anthony Bretaudeau, Björn Grüning, Camila Mazzoni

2 of 13

A distributed network committed to producing high-quality reference genomes for all species

Continental

National/regional

Taxonomy

People, species, technologies differ, but the standards we are trying to reach are consistent.

3 of 13

These genomes have not all been annotated and are not necessarily at the high-quality we expect from the EBP

A growing database of reference genomes

What can we as a community do to help realise the goal of high-quality reference genomes for all species?

Technological advances +

democratisation and distribution of the genome assembly process =

explosion in reference genomes

4 of 13

EBP Assembly Metrics

C.C.Q40 - Telomere-2-Telomere

6.C.Q40 - Reference Standard

5.C.Q40 - Limited Material

Varying:

Sequencing Technologies

Taxa

Ploidy

Heterozygosity

Develop a process to ensure high-quality genomes regardless of methodology

🥇

Many pipelines - a single gold standard

🥈

🥉

5 of 13

  • Read QC
  • Assembly Contiguity & Completeness
  • Gene Completeness
  • Contaminant screen
  • Software & pipeline versions

The ERGA Assembly Report - a community-developed Genome Assembly QC Document

Output report

Galaxy workflow

Create your own EAR!

👂

Saim Momin

Diego De Panis

6 of 13

Managed by github-action bot

The EAReview process

Saim Momin

Arash Kadkhodaei

7 of 13

High-quality Genome Assemblies - now with QC Reports

https://www.ebi.ac.uk/biodiversity/

8 of 13

These genomes have not all been annotated and are not necessarily at the high-quality we expect from the EBP

A growing database of reference genomes

What can we as a community do to help realise the goal of high-quality reference genomes for all species?

Technological advances +

democratisation and distribution of the genome assembly process =

explosion in reference genomes

9 of 13

Identifies the functional elements of the genome - primarily protein-coding sequences

Should encompass the entire functional proteome of the species of interest

Should be accurate in terms of location and structure of the gene models

May also highlight other elements such as non-coding RNA, transposable elements

Given only a small proportion of genomes even have an annotation, how can we determine what is high-quality and help others produce a good annotation for their genome?

What is a high-quality genome annotation?

10 of 13

  • An open project at the Elixir BioHackathon 2023

  • Allowing researchers to bring un-annotated genomes, annotation software and pipelines and QC tools developed in-house

  • Benchmark tools and pipelines

across a range of taxa

  • Install these tools in Galaxy to

allow future researchers to run the

best performing tool on their

genome

The Genomics Community setting the standards - BioHackEU23

Alice Dennis, Jèssica Gómez Garrido & The ERGA Annotation Committee

11 of 13

No one-size-fits-all pipeline when it comes to genome annotation

Machine-learning approaches such as Helixer produce annotations an order of magnitude faster than all other tools, at the cost of accuracy

There remain roadblocks such as access to transcript data or other high-quality genomes of related species

Avoid using a square peg in a round hole

Alice Dennis, Jèssica Gómez Garrido & The ERGA Annotation Committee

12 of 13

More annotation tools introduced to Galaxy

Try annotating your genome using:

  • Red
  • RepeatModeler & RepeatMasker
  • Braker3
  • Helixer
  • Funannotate

Share your results to help us as a community learn what works for each species!

Variety is the spice of life

Galaxy Genome Annotation

Romane Libouban,

Anthony Bretaudeau

13 of 13

Become an ERGA Member!

Visit www.erga-biodiversity.eu for more information!

Want to find out more?