Lucia-Sanz Adriana, Peng Shengyun, Leung Chung Yin Joey, Gupta Animesh, Meyer Justin R, Weitz Joshua S
School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Adobe Inc., Palo Alto, CA 95110, USA.
Virus Evol. 2024 Nov 29;10(1):veae104. doi: 10.1093/ve/veae104. eCollection 2024.
The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary-and largely uncharacterized-genetics of adsorption, injection, cell take-over, and lysis. Here we present a machine learning approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions among 51 strains and 45 phage λ strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. We found that the most effective approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, accurately predicting 86% of interactions while reducing the relative error in the estimated strength of the infection phenotype by 40%. Feature selection revealed key phage λ and mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage λ infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. The method's success in recapitulating strain-level infection outcomes arising during coevolutionary dynamics may also help inform generalized approaches for imputing genetic drivers of interaction phenotypes in complex communities of phage and bacteria.
噬菌体及其细菌宿主的巨大多样性对预测哪些噬菌体感染特定的一组细菌构成了重大挑战。感染很大程度上由吸附、注射、细胞接管和裂解的互补且大多未表征的遗传学决定。在这里,我们提出了一种机器学习方法来预测噬菌体 - 细菌相互作用,该方法基于在实验室条件下共同进化37天的51种菌株和45种噬菌体λ菌株的基因组序列及表型相互作用进行训练。利用多种推理策略且无需了解驱动突变,该框架预测了在一组2295种潜在相互作用中谁感染谁以及感染的定量水平。我们发现最有效的方法是从噬菌体和细菌突变的独立贡献中推断相互作用表型,准确预测了86%的相互作用,同时将感染表型估计强度的相对误差降低了40%。特征选择揭示了对噬菌体 - 细菌相互作用结果有重大影响的关键噬菌体λ和突变,证实了先前已知影响噬菌体λ感染的位点,并识别了以前未显示影响细菌抗性的未知功能基因中的突变。该方法在概括共同进化动态过程中出现的菌株水平感染结果方面的成功,也可能有助于为推断噬菌体和细菌复杂群落中相互作用表型的遗传驱动因素的通用方法提供信息。