1 of 14

Navigating the variant landscape: integrating MAVE data analysis into

Adapted from: https://www.varianteffect.org/

Polina Polunina,

PhD student with

Freiburg Galaxy

June 27, 2024

Brno, Czech Republic

2 of 14

Multiplexed Assays of Variant Effect (MAVEs)

  • Genes provide the instructions necessary for life by encoding proteins.

  • What if a mutation changes the amino acid sequence of the protein?

  • MAVE allows to test all possible amino acid changes in a single experiment.

Source: https://www.youtube.com/watch?v=NRKj9llHy48&ab_channel=CMAP_CEGS

Loss of function

Improved function

No change

3 of 14

Multiplexed Assays of Variant Effect (MAVEs)

  • MAVEs systematically introduce mutations into genes/proteins.
  • Assess thousands of variants simultaneously.
  • Generate scores quantifying impact on protein stability, function, or phenotype.

e.g. TileSeqMave

Source: https://github.com/rothlab/tileseqMave

Source: Fowler et al. (2023, fig. 2A)

4 of 14

Significance of MAVE

  • MAVEs help to understand genetic variants in diseases like cancer.
  • Supports precision medicine and therapeutic development, etc.

Source: Fowler et al. (2023, fig. 1)

5 of 14

The MAVE Community and MaveDB

MaveDB: Collaborative Open-Source Database

  • Central repository for MAVE data.
  • Facilitates data sharing among researchers.

e.g. TileSeqMave

Source: https://github.com/rothlab/tileseqMave

Source: Fowler et al. (2023, fig. 2A)

6 of 14

Galaxy-MaveDB Project

  • Create a user-friendly environment within Galaxy for integrative analysis of data from MaveDB for various organisms
  • Integrate bioinformatic analysis pipelines for processing raw sequencing data and calculating MAVE scores into Galaxy.
  • Empower Galaxy users working with non-human organisms to analyze their data using primarily human MAVE scoresets.

7 of 14

Current Accomplishments

‘Send to Galaxy’ plugin in MaveDB - DONE!

Integration of MAVE scores into Galaxy’s SARS-CoV-2 clinical surveillance (Maier, 2021) pipeline - DONE!

Update of VirHEAT tool for allele frequencies plot enchanced with MAVE scores - DONE!

Data source in Galaxy - DONE!

Source: https://github.com/jonas-fuchs/virHEAT

8 of 14

Viral Surveillance Enhancements

  • MAVE scores integrated to improve analysis.
  • New features in VirHEAT tool:
    • Import CSV files with MAVE scores.
    • Visualize scores mapped to the genome and mutations on the heatmap.

mavedb_spike_RBD_binding_scores

mavedb_spike_RBD_expression_scores

9 of 14

Future Plans

Application in Influenza surveillance

  • Incorporate MAVE scores in Influenza surveillance.
    • MAVE experiments done for 9 Flu genes
    • 11 publications (4 of them in MaveDB)
  • Epidemiologically more interesting than SARS-CoV-2
    • Seasonal outbreaks
    • Rapid mutation rate
    • Zoonotic potential
    • Ability to spillover into humans

10 of 14

Future Plans

MAVE experiments

  • hundreds of researchers around the world
  • ~ 11 million variant effect measurements

Challenges

  • bioinformatics analyses on local machines
  • tools run separately in different software -> semi-manual process
  • various pipelines -> complexity, potential for inconsistency

Enhancements

  • integrate these pipelines into Galaxy -> unified platform
  • file-source plugin for MaveDB in Galaxy -> bidirectional data transfer
  • accessibility, reproducibility of MAVE data analysis within Galaxy

e.g. TileSeqMave

Source: https://github.com/rothlab/tileseqMave

Source: Fowler et al. (2023, fig. 2A)

11 of 14

Bridging Human and Non-Human Research

MSA vs. MAVE

  • visual similarity between MAVE heatmaps and MSA matrices
  • MAVE data and MSAs seem to have similar dimensionality, but require different computational approaches
  • potential patterns between MAVE scores and conservation scores

… a work in progress …

Source: https://www.mavedb.org

Galaxy users like to work with non-humans

MAVE experiments are performed mostly on humans

Multiple sequence alignment

MAVE scores heatmap

12 of 14

Summary

  • The project aims to bring MaveDB data to Galaxy users working with non-human organisms, facilitating collaborative research and comparative genomics studies

  • The integration of MaveDB with Galaxy will empower researchers to confidently analyze MAVE data in their favorite organisms

  • This integration holds significant potential to contribute to comparative genomics and molecular evolution studies

  • The developed tools and methods will be made available to the research community, with the potential for publications and wider impact

13 of 14

Acknowledgments

Galaxy Freiburg Team

  • Rolf Backofen
  • Björn Grüning
  • Bérénice Batut
  • Anika Erxleben-Eggenhofer
  • Wolfgang Maier
  • Helena Rasche
  • Paul Zierep
  • Sebastian Schaaf
  • José Manuel Domínguez
  • Anup Kumar
  • David López Tabernero
  • Engy Nasr
  • Pavankumar Videm
  • Sanjay Kumar Srikakulam
  • Mira Kuntz
  • Mina Hojat Ansari
  • Alireza Heidari
  • Laila Los
  • Amirhossein Naghsh Nilchi
  • Saim Momin
  • Arash Kadkhodaei
  • Daniela Schneider

WEHI, Melbourne, Australia

  • Alan Rubin

Institute of Virology, University Medical Centre Freiburg

  • Jonas Fuchs

Brotman Baty Institute for Precision Medicine, Seattle, USA

  • Jeremy Stone
  • Ashley Snyder

Connect with me

Matrix: @polina.polunina:matrix.org

Email: polunina@informatik.uni-freiburg.de

GitHub: github.com/PlushZ

Reach out to me today in person!

14 of 14

References

Fowler, D. M., Adams, D. J., Gloyn, A. L., Hahn, W. C., Marks, D. S., Muffley, L. A., Neal, J. T., Roth, F. P., Rubin, A. F., Starita, L. M., & Hurles, M. E. (2023). An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biology, 24(1). https://doi.org/10.1186/s13059-023-02986-x

Esposito, D., Weile, J., Shendure, J., Starita, L. M., Papenfuss, A. T., Roth, F. P., Fowler, D. M., & Rubin, A. F. (2019). MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biology, 20(1). https://doi.org/10.1186/s13059-019-1845-6

Maier, Wolfgang, Simon Bray, Marius van den Beek, Dave Bouvier, Nathan Coraor, Milad Miladi, Babita Singh, et al. “Ready-to-Use Public Infrastructure for Global SARS-CoV-2 Monitoring.” Nature Biotechnology 39, no. 10 (October 2021): 1178–79. https://doi.org/10.1038/s41587-021-01069-1.