Host Prediction
By Malte and Varada��23.09.2024
Background
Viruses affect microbial communities and therefore their environments THROUGH their hosts.
Release organic matter
Auxiliary metabolic genes
Background
Ideally, you would have an isolate bacteria that you test phages on… but we just have our data
These “signals” are based on biological interactions
What are some biological interactions?
Adsorption - attachment
Insertion of the genome into the cell
Horizontal gene transfer
Defense/anti-defense mechanisms
Using cellular machinery
LYSIS
Prophages�/viral genes
Ecological Dynamics
Host Defense Mechanisms
Viral genes inside host genome
Matching Coverage Profiles
K-mer profiles
CRISPR spacers
Transduction
Auxiliary Metabolic Genes
Host genes inside viral genome
Host genome
Viral genome
Metabolic gene?
Homology Based
Non-Homology Based
tRNAs
Using cell machinery
Where to find the hosts?
Database
Prokaryotic fraction of your metagenome
“Bins” or MAGs
CRISPR �spacers DB
Refseq
IMGVR/ Mgnify
“Host-based” vs. “Phage-based”
Phage-host
Phage-phage
This is a feature of RaFAH also!
Disadvantages - homology-based
One does not simply
blast a host
Disadvantages - non-homology
High recall, but matches many hosts!
% of hosts predicted correctly
Machine learning methods�for phage-host interactions
Many methods – all have biases
Nie et al., Briefings in Bioinformatics, 2024
Many methods – all have biases
Roux et al., PLOS Biology, 2023
ML methods for phage host interactions
Informative features for PHI
Informative features for PHI
Boeckaerts et al., Sci Rep, 2021
Training data – the good
Roux et al., Nat Biotechnol, 2019
Training data – the bad
Camargo et al., NAR, 2023
IMG/VR 4
database
Training data – the ugly
-> subsample large datasets
-> random sampling of hosts distant to known hosts
-> model-based sampling
Machine learning algorithms - GNN
Hamilton et al., arxiv, 2017
Machine learning algorithms - GNN
Machine learning algorithms - Random Forest
https://catalyst.earth/catalyst-system-files/help/concepts/focus_c/oa_classif_intro_rt.html