Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec City, Québec G1V 0A6, Canada.
Groupe de recherche en écologie buccale, Faculté de médecine dentaire, Université Laval, Québec City, Québec G1V 0A6, Canada.
Nucleic Acids Res. 2021 Apr 6;49(6):3127-3138. doi: 10.1093/nar/gkab133.
Thousands of new phages have recently been discovered thanks to viral metagenomics. These phages are extremely diverse and their genome sequences often do not resemble any known phages. To appreciate their ecological impact, it is important to determine their bacterial hosts. CRISPR spacers can be used to predict hosts of unknown phages, as spacers represent biological records of past phage-bacteria interactions. However, no guidelines have been established to standardize host prediction based on CRISPR spacers. Additionally, there are no tools that use spacers to perform host predictions on large viral datasets. Here, we developed a set of tools that includes all the necessary steps for predicting the hosts of uncharacterized phages. We created a database of >11 million spacers and a program to execute host predictions on large viral datasets. Our host prediction approach uses biological criteria inspired by how CRISPR-Cas naturally work as adaptive immune systems, which make the results easy to interpret. We evaluated the performance using 9484 phages with known hosts and obtained a recall of 49% and a precision of 69%. We also found that this host prediction method yielded higher performance for phages that infect gut-associated bacteria, suggesting it is well suited for gut-virome characterization.
由于病毒宏基因组学的发展,最近发现了数千种新噬菌体。这些噬菌体非常多样化,它们的基因组序列通常与任何已知的噬菌体都不相似。为了了解它们的生态影响,确定它们的细菌宿主是很重要的。CRISPR 间隔区可用于预测未知噬菌体的宿主,因为间隔区代表了过去噬菌体-细菌相互作用的生物记录。然而,目前还没有建立基于 CRISPR 间隔区来标准化宿主预测的指南。此外,也没有工具可以利用间隔区对大型病毒数据集进行宿主预测。在这里,我们开发了一套工具,其中包括预测未鉴定噬菌体宿主所需的所有步骤。我们创建了一个包含超过 1100 万个间隔区的数据库,并开发了一个程序,可以对大型病毒数据集执行宿主预测。我们的宿主预测方法使用了受 CRISPR-Cas 作为适应性免疫系统的自然工作方式启发的生物学标准,这使得结果易于解释。我们使用 9484 个已知宿主的噬菌体评估了性能,得到了 49%的召回率和 69%的精度。我们还发现,这种宿主预测方法对于感染肠道相关细菌的噬菌体具有更高的性能,这表明它非常适合肠道病毒组特征描述。