DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
Instituto de Ciencias del Mar (ICM-CSIC), Barcelona, Spain.
PLoS Biol. 2023 Apr 21;21(4):e3002083. doi: 10.1371/journal.pbio.3002083. eCollection 2023 Apr.
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
目前,通过宏基因组学主要研究感染细菌和古菌的病毒的非凡多样性。虽然宏基因组能够高通量地探索病毒的序列空间,但与分离的病毒相比,宏基因组衍生的序列缺乏关键信息,特别是宿主相关性。有不同的计算方法可用于根据基因组序列预测未培养病毒的宿主,但到目前为止,每种方法要么在精度上,要么在召回率上都受到限制,即对于一些病毒,它们会产生错误的预测或根本没有预测。在这里,我们描述了 iPHoP,这是一个两步框架,它集成了多种方法,可以可靠地预测感染细菌和古菌的广泛病毒的属分类群宿主,同时保持低假阳性率。基于来自 IMG/VR 数据库的大量宏基因组衍生病毒基因组数据集,我们说明了 iPHoP 如何能够提供广泛的宿主预测,并指导未培养病毒的进一步特征描述。
Nat Biotechnol. 2021-4
Nat Commun. 2021-7-30
Brief Bioinform. 2025-8-31
Res Sq. 2025-8-19
Environ Microbiome. 2025-8-25
bioRxiv. 2025-8-11
Gut Microbes Rep. 2025
Brief Funct Genomics. 2025-1-15
Phage (New Rochelle). 2022-12-1
Genomics Proteomics Bioinformatics. 2022-6
Bioinformatics. 2022-2-7
Brief Bioinform. 2022-1-17