Suppr超能文献

原核病毒宿主预测器:一种用于宏基因组中原核病毒宿主预测的高斯模型。

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics.

机构信息

Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.

Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.

出版信息

BMC Biol. 2021 Jan 14;19(1):5. doi: 10.1186/s12915-020-00938-6.

Abstract

BACKGROUND

Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods.

RESULTS

We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28-34%, genus level). PHP also outperformed these two alignment-free methods much (24-38% vs 18-20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP.

CONCLUSIONS

The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.

摘要

背景

病毒是无处不在的生物实体,据估计是地球上尚未探索的遗传多样性最大的储存库。对新发现的病毒进行全面的功能特征描述和注释需要工具来进行分类学归属、宿主范围和病毒的生物学特性。在这里,我们专注于原核病毒,包括噬菌体和古菌病毒,对于这些病毒,确定病毒的宿主是对病毒进行特征描述的必要步骤,因为病毒的生存依赖于宿主。目前,确定病毒宿主的方法要么是培养病毒,这种方法低通量、耗时且昂贵,要么是通过计算预测病毒的宿主,这需要在准确性和可用性方面都进行改进。在这里,我们开发了一种高斯模型来预测原核病毒的宿主,其性能优于以前的计算方法。

结果

我们在这里提出了原核病毒宿主预测器(PHP),这是一种使用高斯模型的软件工具,使用病毒和宿主基因组序列之间的 k-mer 频率差异作为特征来预测原核病毒的宿主。PHP 在 VirHostMatcher 基准数据集上的宿主预测准确率为 34%(属水平),在包含 671 种病毒和 60105 种原核基因组的新数据集上的宿主预测准确率为 35%(属水平)。预测精度高于两种无比对方法(VirHostMatcher 和 WIsH,属水平 28-34%)。当预测无法仅通过 BLAST 或 CRISPR-spacer 方法预测的原核病毒的宿主时,PHP 也明显优于这两种无比对方法(24-38%对 18-20%,属水平)。通过设定一个最小得分进行预测(阈值)和对前 30 个预测结果取共识,可以进一步提高 PHP 的宿主预测准确性。

结论

原核病毒宿主预测器软件工具为本文描述的高斯模型提供了一个直观且用户友好的 API。这项工作将促进在宏基因组研究中快速鉴定新发现的原核病毒的宿主。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14be/7807511/28663b46d349/12915_2020_938_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验