An Ji-Yong, Meng Fan-Rong, You Zhu-Hong, Chen Xing, Yan Gui-Ying, Hu Ji-Pu
School of Computer Science Technology, China University of Mining and Technology, Xuzhou, Jiangsu, 21116, China.
School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, 21116, China.
Protein Sci. 2016 Oct;25(10):1825-33. doi: 10.1002/pro.2991. Epub 2016 Aug 9.
Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/.
预测蛋白质-蛋白质相互作用(PPI)是一项具有挑战性的任务,对于构建蛋白质相互作用网络至关重要,而这对于促进我们理解生物系统的机制非常重要。尽管已经提出了许多高通量技术来预测PPI,但仍存在不可避免的缺点,包括成本高、耗时以及固有的高假阳性率。由于这些原因,人们提出了许多计算方法来预测PPI。然而,这个问题仍然远未解决。在本文中,我们提出了一种名为RVM-BiGP的新型计算方法,它结合了相关向量机(RVM)模型和二元概率(BiGP),用于从蛋白质序列中检测PPI。主要改进包括:(1)使用位置特异性得分矩阵(PSSM)上的二元概率(BiGP)特征表示来表示蛋白质序列,其中包含蛋白质进化信息;(2)为了减少噪声的影响,使用主成分分析(PCA)方法来降低BiGP向量的维度;(3)使用强大且稳健的相关向量机(RVM)算法进行分类。在酵母和幽门螺杆菌数据集上进行的五折交叉验证实验分别取得了94.57%和90.57%的非常高的准确率。实验结果明显优于先前的方法。为了进一步评估所提出的方法,我们在酵母数据集上将其与最先进的支持向量机(SVM)分类器进行了比较。实验结果表明,我们的RVM-BiGP方法明显优于基于SVM的方法。此外,我们在不平衡酵母数据集上达到了97.15%的准确率,高于平衡酵母数据集。这些有前景的实验结果表明了所提出方法的效率和稳健性,它可以成为未来蛋白质组学研究的自动决策支持工具。为了便于未来蛋白质组学研究的广泛开展,我们用超文本预处理器(PHP)开发了一个名为RVM-BiGP-PPIs的免费网络服务器,用于预测PPI。该网络服务器包括源代码和数据集,可在http://219.219.62.123:8888/BiGP/获取。