Manavalan Balachandran, Shin Tae H, Lee Gwang
Department of Physiology, Ajou University School of Medicine, Suwon, South Korea.
Institute of Molecular Science and Technology, Ajou University, Suwon, South Korea.
Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
从未经表征的序列中准确识别噬菌体病毒粒子蛋白对于理解噬菌体与其宿主细菌之间的相互作用至关重要,以便开发新的抗菌药物。然而,使用实验技术鉴定此类蛋白既昂贵又耗时;因此,需要开发一种高效的计算算法,以便在实验之前预测噬菌体病毒粒子蛋白(PVP)。在此,我们描述了一种基于支持向量机(SVM)的PVP预测器,称为PVP-SVM,它使用136个最佳特征进行训练。采用了一种特征选择方案,从包括氨基酸组成、二肽组成、原子组成、物理化学性质和链转移分布在内的大量特征中识别最佳特征。在留一法交叉验证期间,PVP-SVM的准确率达到0.870,比使用所有特征训练的对照SVM预测器高6%,表明特征选择方法的有效性。此外,当使用独立数据集进行客观评估时,PVP-SVM与当前可用的方法PVPred以及本研究中开发的其他两种机器学习方法相比,表现出更优的性能。为方便科学界使用,已在www.thegleelab.org/PVP-SVM/PVP-SVM.html上建立了一个用户友好且可公开访问的网络服务器。