1 of 47

Transcriptomics and proteomics of SARS-CoV-2

- an update to recent COVID-19 workflow developments

24 February 2021

17.00 CET

Nathan Roach

Milad Miladi

Pratik Jagtap

Subina Mehta

usegalaxy.*

2 of 47

Characterization of SARS-CoV-2 long read sequencing

Nathan Roach

GalaxyWorks LLC

Work largely done by Wolfgang Maier

*

3 of 47

COVID-19 analysis on usegalaxy.

https://covid19.galaxyproject.org

  • All publicly accessible via GitHub
  • 6 different types of analysis
  • 5 different Galaxy servers - including Galaxy Australia
  • Workflows and tools available on all servers
  • Workload shared amongst global clouds and compute resources
  • Reproducible across multiple servers
  • Analysis is ongoing

4 of 47

bwa-mem

lofreq

snpEff (covid-19 release)

bwa-mem

lofreq

snpEff (covid-19 release)

ivar

mapping

variant calling

variant annotation

primer trimming�flagging of “tainted” amplicons

minimap2

medaka

snpEff (covid-19 release)

covid19.galaxyproject.org variation analysis workflows

5 of 47

Illumina WGS

Illumina ARTIC

ONT ARTIC

aggressively call all variants that you reasonably can

use soft filters (VCF INFO field) to flag the most questionable variants

Reporting

Consensus building

6 of 47

Most recent Galaxy workflow for ARTIC protocol ONT variant calling

7 of 47

VCF

Reports

Consensus FASTA

Downstream variant analysis/providers

Direct data exploration through tabular datasets and plots

nextstrain�GISAID�Genome surveillance initiatives

8 of 47

(+) sense SARS-CoV-2 RNAs consist of gRNAs and sgRNAs

Kim, Cell 2020; The Architecture of SARS-CoV-2 Transcriptome

9 of 47

sgRNA mapping via spliced minimap2 alignment

10 of 47

sgRNA binning by TRS-B sequence overlap with reads

11 of 47

RNA modifications of SARS-CoV-2

using nanopore direct RNA-seq

Milad Miladi�

Bioinformatics/Galaxy Group, University of Freiburg

A joint work with:

Jonas Fuchs, Wolfgang Maier, Ralf Gilsbach, Björn Grüning

*

12 of 47

Coronaviruses

Coronaviruses ➜ Beta-coronaviruses ➜ SARS-CoV-2

Coronaviruses have positive-sense single-stranded RNA genome.

[Anthony Fauci, mRNA Health Conference, 2020;

Jin et al. Viruses, 2020.]

Milad Miladi, University of Freiburg

13 of 47

SARS-CoV-2 transcriptome

Coronavirus replication cycle produces a complex nested composition of sub-genomic RNAs within the genome.

[Kim et al. Cell, 2020;

da Costa et al. Arch Virol, 2020]

Milad Miladi, University of Freiburg

14 of 47

RNA modifications

After transcription, enzymes target the RNA and introduce a variety of modifications onto nucleobases.

About 170 types of modification have been discovered!

[Niehrs lab, imb.de, 2019; Jonkhout et al. RNA, 2017; Machnicka et al. NAR 2012.]

Milad Miladi, University of Freiburg

15 of 47

Direct RNA sequencing (DRS) using Oxford Nanopore:

modification detection

unbiased PCR-free quantification

[Workman et al. Nature Methods 2019;

StreetScience community 2019..]

Milad Miladi, University of Freiburg

16 of 47

DRS data processing and modification detection workflows

Read-alignment & assignment

Tombo

Nanocompore

reads from infected cell

map to

host + virus

classify viral reads

control reads

genome & sub-genome viral reads

train canonical base model

viral reads

compare signal distributions

modification scores

match reads to the model

modification scores

Tombo

Nanocompore/Tombo

[Stoiber et al. 2016; Leger et al. 2019]

Milad Miladi, University of Freiburg

17 of 47

SARS-CoV-2 DRS:

an overview of the results

Milad Miladi, University of Freiburg

18 of 47

DRS: mapping stat from three isolates

From the poly-A tailed transcripts of the infected cells, above 60% are viral!

Milad Miladi, University of Freiburg

19 of 47

Varying but consistent rates of reads and positions are modified.

Modification sites are Adenine enriched.

Consistent & conserved modifications in the 3 biological replicates.

Milad Miladi, University of Freiburg

20 of 47

Modifications are enriched at the 3’end.

Modifications occur in the context of functional RNA elements.

Milad Miladi, University of Freiburg

21 of 47

Collaboration of:

  • Galaxy/Bioinformatics Group, University of Freiburg
  • Inst. of Virology, Universitätsklinikum Freiburg
  • Inst. for Cardiovascular Physiology, University of Frankfurt

Summary: SARS-CoV-2 RNA modifications

  • Modifications can impact the fate of protein-coding genes
  • Modifications from 3 biological replicates in human lung epithelial cells
    • conserved modification patterns
  • Modification of genomic and sub-genomic RNAs at the regulatory elements
    • potential therapeutic targets
  • Bioinformatics analysis in Galaxy (https://usegalaxy.eu)
    • reproducible and accessible to everyone

Ongoing work: (with Jonas Fuchs)

  • Methylation inhibitors and METTL3 KO cells impacts modifications & the course of infection

Milad Miladi

University of Freiburg | EMBL

miladim@cs.uni-freiburg.de

miladi@embl.de

Milad Miladi, University of Freiburg

22 of 47

GALAXY WORKFLOWS FOR

THE ANALYSIS OF COVID-19 �MASS SPECTROMETRY DATASETS

24 February 2021

10.00 CT/ 11.00 ET/ 17.00 CET

Pratik Jagtap

Subina Mehta

University of Minnesota

Galaxy-P Team

23 of 47

COVID-19 DETECTION METHODS

asms.org

Image credit (left): Gerd Altmann, Pixabay License, https://pixabay.com/illustrations/corona-coronavirus-virus-covid-19-4959447

MS

24 of 47

COVID-19 DETECTION MASS SPECTROMETRY METHODS

In silico approach toward the identification of unique peptides from viral protein infection: Application to COVID-19.

Orsburn et al doi: https://doi.org/10.1101/2020.03.08.980383 April 2020

Mass Spectrometric Identification of SARS-CoV-2

Proteins from Gargle Solution Samples of COVID-19 Patients. Ihling et al J Proteome Res. 6;19(11): 4389-4392. doi: 10.1021/acs.jproteome.0c00280.

April 2020

Shotgun proteomics analysis of SARS-CoV-2-infected cells and how it can optimize whole

viral particle antigen production for vaccines. Grenga et al; Emerg Microbes Infect 9(1):1712-1721. doi:10.1080/22221751.2020.1791737. May 2020

25 of 47

Dataset

ProteomeXchange ID | Pubmed ID

Lab

Gargling Solution

PXD019423

| PMID: 32568543

Sinz Lab (Halle, Germany)

Nasopharyngeal swabs

PXD020394

| PMID: 32835036

Lima Lab (Montevideo, Uruguay)

Respiratory tract samples

PXD021328

| PMID: 33273458

Carvalho Lab (São Paulo, Brazil)

Broncheo-alveolar lavage fluid (BALF)

PXD022085

| PMID: 33098359

Cheng Lab (Wuhan, China)

Lung Samples

PXD018094

| PMID: 33060566

Zhong Lab (Beijing, China)

Gut Microbiome

PXD023099 | Unpublished

Yan Lab (Guangzhou, China)

Dataset

ProteomeXchange ID | Pubmed ID

Lab

Time series

PXD018594

| PMID: 32619390

Armengaud Lab (Bagnols‐sur‐Cèze, France)

8 hours time point

PXD018804 | PMID: 32462744

Armengaud Lab (Bagnols‐sur‐Cèze, France)

Proteo-transcriptomics analysis

PXD018241

| PMID: 32723359

Matthews Lab (Bristol, UK)

Host-viral protein interaction

PXD018117 | PMID: 32353859

Krogan Lab (San Franscisco, CA)

CLINICAL SAMPLES

CELL CULTURE

https://www.ucsf.edu/magazine/covid-body

26 of 47

https://covid19.galaxyproject.org/proteomics/

Peter Thuy-Boun et al http://dx.doi.org/10.1021/acs.jproteome.0c00822 

A rigorous evaluation of optimal peptide targets for MS-based clinical diagnostics

of Coronavirus Disease 2019 (COVID-19).

Andrew Rajczewski et al (Preprint in MedRxiv)

https://www.medrxiv.org/content/10.1101/2021.02.09.21251427v1

27 of 47

Determining the optimal peptides for COVID-19 diagnosis in Galaxy

28 of 47

Multiple datasets were used in the creation of a peptide panel and the validation of their utility in diagnosing SARS-CoV-2

Peptide Panel Generation

Peptide Validation

29 of 47

Database Search Workflow

30 of 47

Peptide Validation Workflow

31 of 47

Peptides across SARS-CoV-2 were detected and validated in Galaxy

32 of 47

PSMs of SARS-CoV-2 peptides in the upper respiratory clinical datasets are of higher confidence than deep lung datasets

33 of 47

Protein assignment of detected and validated SARS-CoV-2 peptides

34 of 47

Four peptides were selected as optimal targets for SARS-CoV-2 detection

35 of 47

BLAST-P shows specificity of these peptides to SARS-CoV-2

MetaTryp sequence identity

The four peptides panel shows demonstrates that it is specific to SARS-CoV2

36 of 47

Conclusions

  • Based on Clinical and Cell culture MS datasets, we identified peptides throughout the SARS-CoV-2 proteome.
  • High-confidence PepQuery scoring and manual spectra interrogation reveal four confident peptides.
    • MAGNGGDAALALLLLDR
    • RGPEQTQGNFGDQELIR
    • DGIIWVATEGALNTPK
    • IGMEVTPSGTWLTYTGAIK
  • Deep-lung samples may be unsuitable for diagnosis of COVID-19 using targeted clinical proteomics experiments.

37 of 47

METAPROTEOMICS ANALYSIS OF SARS-CoV-2 INFECTED PATIENT SAMPLES REVEALS PRESENCE

OF POTENTIAL CO-INFECTING MICROORGANISMS�

38 of 47

CO-INFECTION IN COVID-19 PATIENTS

  • Co-infection has an effect on the diagnosis, symptoms, treatment and mortality.
  • Patient could be infected prior to COVID-19 infection or during hospitalization.
  • Nosocomial infections can affect antibiotics treatment plans due to antibiotic resistance.
  • Culture-based detection methods prolong diagnosis of the disease.

https://link.springer.com/article/10.1007/s00253-020-10814-6

Zhu et al Co-infection with respiratory pathogens among COVID-2019 cases. Virus Res . (2020) 285:198005.

Chen et al The microbial coinfection in COVID-19 Appl Microbiol Biotechnol . (2020) 104(18):7777-7785.

Mirzai et al Bacterial co-infections with SARS-CoV-2 IUBMB Life . (2020) 72(10):2097-2111.

Bao et al Oral Microbiome and SARS-CoV-2: Beware of Lung Co-infection. Front Microbiol . (2020);11:1840.

39 of 47

METAPROTEOMICS WORKFLOW

https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00822

40 of 47

Dataset

Organisms detected in COVID-19 patient samples

Link

Gargling solution (PXD019423)

Streptococcus pneumoniae, Lactobacillus rhamnosus and SARS-CoV-2

Oro- and Naso-pharyngeal tract (PXD020394)

Pseudomonas monteilii, Pseudomonas sps. Bc-h, Acinetobacter ursingii and SARS-CoV-2

Respiratory tract (PXD021328)

SARS-CoV-2

DATASETS AND ORGANISMS DETECTED

Streptococcus pneumoniae

Causes pneumonia (respiratory-tract infection)

Lactobacillus rhamnosus

Probiotic

Pseudomonas sp. BcH

Unclassified Pseudomonas

Pseudomonas monteilii

Meningoencephalitis

Acinetobacter ursingii

Bacteremia

41 of 47

SPECTRAL VALIDATION USING LORIKEET

Lactobacillus rhamnosus

Acinetobacter ursingii

SARS CoV-2

Streptococcus pneumoniae

42 of 47

CONCLUSIONS

  • Galaxy workflows are available for analysis of COVID-19 MS datasets. (https://covid19.galaxyproject.org/proteomics/)
  • We could detect peptides that spanned the SARS-CoV-2 proteome
  • Metaproteomics analysis revealed presence of potential co-infecting microorganisms in COVID-19 patient samples.
  • Future plans

43 of 47

https://covid19.galaxyproject.org/proteomics/

RESOURCES AVAILABLE AT

44 of 47

Minnesota Supercomputing Institute

James Johnson

Michael Milligan

Maria Doyle

Melbourne , Australia

University of Minnesota

Timothy Griffin Subina Mehta

Andrew Rajczewski

Dinh Duy An Nguyen

Emma Leith

Ray Sajulga

Praveen Kumar

Caleb Easterly

Marie Crane

Biologists / collaborators

Joel Rudney

Maneesh Bhargava

Amy Skubitz

Chris Wendt

Kristin Boylan

Brian Sandri

Alexa Pragman

Harald Barsnes Marc Vaudel University of Bergen, Norway

University of Freiburg,

Freiburg, Germany

VIB, UGhent, Belgium

Matt Chambers

Nashville, TN

Alessandro Tanca

Porto Conte Ricerche, Italy

Carolin Kolmeder

University of Helsinki, Finland

Thilo Muth

Robert Koch Institut

Jeremy Fisher

Yuzhen Ye

Sujun Li

Indiana University

Peter Thuy-Boun

Dennis Wolan

Scripps Institute

Brook Nunn

U of Washington

Lennart Martens

Bart Mesuere

Bjoern Gruening

Lloyd Smith (Co-I)

Michael Shortreed

UW-Madison

Anamika Krishanpal

Persistent Systems Limited

Brian Searle

Institute of Systems

Biology

Funding

ACKNOWLEDGMENTS

Magnus Øverlie Arntzen

NMBU,

Oslo, Norway

galaxyp.org

Saskia

Hiltemann

45 of 47

Next up!

Galaxy-ELIXIR webinars series: Advanced Features

46 of 47

Acknowledgments

usegalaxy.org efforts are funded by NIH Grants U41 HG006620 and NSF ABI Grant 1661497. usegalaxy.eu is supported by the German Federal Ministry of Education and Research grants 031L0101C and de.NBI-epi. Galaxy and HyPhy integration is supported by NIH grant R01 AI134384. usegalaxy.org.au is supported by Bioplatforms Australia and the Australian Research Data Commons through funding from the Australian Government National Collaborative Research Infrastructure Strategy. Hyphy.org development team is supported by NIH grant R01GM093939. usegalaxy.be is supported by the Research Foundation-Flanders (FWO) grant I002919N and the Flemish Supercomputer Center (VSC). EOSC-Life has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 824087

47 of 47