Tom Brown, Diego De Panis, Romane Libouban, Saim Momin, Arash Kadkhodaei, Anthony Bretaudeau, Björn Grüning, Camila Mazzoni
A distributed network committed to producing high-quality reference genomes for all species
Continental
National/regional
Taxonomy
People, species, technologies differ, but the standards we are trying to reach are consistent.
These genomes have not all been annotated and are not necessarily at the high-quality we expect from the EBP
A growing database of reference genomes
What can we as a community do to help realise the goal of high-quality reference genomes for all species?
Technological advances +
democratisation and distribution of the genome assembly process =
explosion in reference genomes
EBP Assembly Metrics |
C.C.Q40 - Telomere-2-Telomere |
6.C.Q40 - Reference Standard |
5.C.Q40 - Limited Material |
Varying:
Sequencing Technologies
Taxa
Ploidy
Heterozygosity
Develop a process to ensure high-quality genomes regardless of methodology
🥇
Many pipelines - a single gold standard
🥈
🥉
The ERGA Assembly Report - a community-developed Genome Assembly QC Document
Output report
Galaxy workflow
Create your own EAR!
👂
Saim Momin
Diego De Panis
Managed by github-action bot
The EAReview process
Saim Momin
Arash Kadkhodaei
High-quality Genome Assemblies - now with QC Reports
https://www.ebi.ac.uk/biodiversity/
These genomes have not all been annotated and are not necessarily at the high-quality we expect from the EBP
A growing database of reference genomes
What can we as a community do to help realise the goal of high-quality reference genomes for all species?
Technological advances +
democratisation and distribution of the genome assembly process =
explosion in reference genomes
Identifies the functional elements of the genome - primarily protein-coding sequences
Should encompass the entire functional proteome of the species of interest
Should be accurate in terms of location and structure of the gene models
May also highlight other elements such as non-coding RNA, transposable elements
Given only a small proportion of genomes even have an annotation, how can we determine what is high-quality and help others produce a good annotation for their genome?
What is a high-quality genome annotation?
across a range of taxa
allow future researchers to run the
best performing tool on their
genome
The Genomics Community setting the standards - BioHackEU23
Alice Dennis, Jèssica Gómez Garrido & The ERGA Annotation Committee
No one-size-fits-all pipeline when it comes to genome annotation
Machine-learning approaches such as Helixer produce annotations an order of magnitude faster than all other tools, at the cost of accuracy
There remain roadblocks such as access to transcript data or other high-quality genomes of related species
Avoid using a square peg in a round hole
Alice Dennis, Jèssica Gómez Garrido & The ERGA Annotation Committee
More annotation tools introduced to Galaxy
Try annotating your genome using:
Share your results to help us as a community learn what works for each species!
Variety is the spice of life
Galaxy Genome Annotation
Romane Libouban,
Anthony Bretaudeau
Become an ERGA Member!
Visit www.erga-biodiversity.eu for more information!
Want to find out more?