Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS Comput Biol. 2024 Sep 18;20(9):e1011649. doi: 10.1371/journal.pcbi.1011649. eCollection 2024 Sep.
Viruses of microbes are ubiquitous biological entities that reprogram their hosts' metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only "who is there?" we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding "who do they infect?" Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, 'Virus Host Range network' (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.
微生物病毒是普遍存在的生物实体,它们在感染过程中重新编程宿主的新陈代谢,以产生病毒后代,从而影响微生物组的生态和进化,对人类和环境健康有着广泛的影响。基因组测序技术的进步导致了数以百万计的新病毒的发现,并使人们认识到地球上病毒的多样性。然而,仅仅了解“有哪些病毒?”我们就无法推断病毒对微生物在种群、群落和生态系统尺度上的影响。为此,我们需要更明确地了解“它们感染哪些宿主?”在这里,我们开发了一种新的机器学习模型(ML),即病毒-宿主相互作用预测器(VHIP),用于根据输入的病毒和宿主基因组预测病毒-宿主相互作用(感染/非感染)。该 ML 模型是在一个经过精心手工整理的 8849 对病毒-宿主对及其相应序列数据的高价值数据集上进行训练和测试的。由此产生的数据集“病毒宿主范围网络”(VHRnet)是 VHIP 功能的核心。VHIP 训练和测试所依据的每个数据点代表 VHRnet 中经过实验室测试的病毒-宿主对,从中可以从基因组序列中计算出病毒适应宿主的有意义信号。VHIP 与现有的病毒-宿主预测模型不同,它能够预测多种相互作用,而不是预测单个最可能的宿主或宿主进化枝。因此,VHIP 能够推断自然系统中病毒-宿主网络的复杂性。VHIP 在预测种间病毒-宿主对相互作用的准确率为 87.8%,并且可以应用于从宏基因组数据集重建的新的病毒和宿主群体基因组。