Wang Lei, You Zhu-Hong, Chen Xing, Li Jian-Qiang, Yan Xin, Zhang Wei, Huang Yu-An
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.
Oncotarget. 2017 Jan 17;8(3):5149-5159. doi: 10.18632/oncotarget.14103.
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.
蛋白质-蛋白质相互作用(PPI)不仅是细胞中各种生物过程的关键组成部分,也是理解生物体健康和疾病状态形成机制的关键。然而,通过生物学实验来鉴定蛋白质之间的相互作用既耗时又成本高昂。因此,如何开发一种更高效的计算方法迅速成为后基因组时代一个引人关注的话题。在本文中,我们提出了一种仅从蛋白质氨基酸序列推断蛋白质-蛋白质相互作用的新方法。具体而言,首先将蛋白质氨基酸序列转化为由多序列比对生成的位置特异性得分矩阵(PSSM);然后使用伪PSSM提取特征描述符。最后,训练集成旋转森林(RF)学习系统,仅基于蛋白质序列特征来预测和识别PPI。当在所提出的方法应用于用于预测PPI的三个基准数据集(酵母、幽门螺杆菌和独立数据集)时,我们的方法分别能够实现98.38%、89.75%和96.25%的良好平均准确率。为了进一步评估预测性能,我们还使用相同的基准数据集将所提出的方法与其他方法进行了比较。实验结果表明,所提出的方法始终优于其他现有最先进的方法。因此,我们的方法有效且稳健,可作为探索和发现蛋白质之间新关系的有用工具。一个网络服务器已在网址http://202.119.201.126:8888/PsePSSM/ 公开提供以供学术使用。