School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Int J Mol Sci. 2018 Jun 15;19(6):1779. doi: 10.3390/ijms19061779.
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.
噬菌体在细菌的生态和进化中起着至关重要的作用,它们在基因工程的发展中起着关键作用。噬菌体病毒粒子蛋白是感染性病毒颗粒的重要物质,负责多种生物学功能。正确识别噬菌体病毒粒子蛋白对于理解分子水平的生命和遗传进化都非常重要。然而,目前可用的用于识别噬菌体病毒粒子蛋白的计算方法很少。在本文中,我们提出了一种使用基于 g-gap 特征树生成的离散特征的多项式朴素贝叶斯分类模型来预测噬菌体病毒粒子蛋白的新方法。该模型在 10 折交叉验证中的准确率达到 98.37%,MCC 为 96.27%。这一结果表明,该方法可以成为从序列信息中识别噬菌体病毒粒子蛋白的一种有用方法。为了方便实验科学家,我们实现了该预测器的一个 Web 服务器(PhagePred),可以在互联网上免费访问。