Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China.
Molecules. 2018 Aug 10;23(8):2000. doi: 10.3390/molecules23082000.
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.
准确识别噬菌体病毒蛋白不仅是了解噬菌体病毒蛋白功能的关键步骤,还有助于进一步了解细菌细胞的裂解机制。由于传统的实验方法在识别噬菌体病毒蛋白时既耗时又昂贵,因此极有必要应用机器学习方法来准确、高效地识别噬菌体病毒蛋白。在这项工作中,提出了一种基于支持向量机(SVM)的方法,通过混合多组最佳的 g-gap 二肽组成来实现。方差分析(ANOVA)和最小冗余最大相关性(mRMR)与增量特征选择(IFS)相结合,用于挑选出最佳特征集。在五重交叉验证测试中,所提出的方法实现了 87.95%的整体准确率。我们相信,所提出的方法将成为噬菌体病毒蛋白领域科学家的一种高效、强大的方法。