Suppr超能文献

利用来自位置特异性得分矩阵的进化信息和集成分类器提高蛋白质-蛋白质相互作用的预测准确性。

Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier.

作者信息

Wang Lei, You Zhu-Hong, Xia Shi-Xiong, Liu Feng, Chen Xing, Yan Xin, Zhou Yong

机构信息

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China; College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China.

The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.

出版信息

J Theor Biol. 2017 Apr 7;418:105-110. doi: 10.1016/j.jtbi.2017.01.003. Epub 2017 Jan 11.

Abstract

Protein-Protein Interactions (PPIs) are essential to most biological processes and play a critical role in most cellular functions. With the development of high-throughput biological techniques and in silico methods, a large number of PPI data have been generated for various organisms, but many problems remain unsolved. These factors promoted the development of the in silico methods based on machine learning to predict PPIs. In this study, we propose a novel method by combining ensemble Rotation Forest (RF) classifier and Discrete Cosine Transform (DCT) algorithm to predict the interactions among proteins. Specifically, the protein amino acids sequence is transformed into Position-Specific Scoring Matrix (PSSM) containing biological evolution information, and then the feature vector is extracted to present protein evolutionary information using DCT algorithm; finally, the ensemble rotation forest model is used to predict whether a given protein pair is interacting or not. When performed on Yeast and H. pylori data sets, the proposed method achieved excellent results with an average accuracy of 98.54% and 88.27%. In addition, we achieved good prediction accuracy of 98.08%, 92.75%, 98.87% and 98.72% on independent data sets (C.elegans, E.coli, H.sapiens and M.musculus). In order to further evaluate the performance of our method, we compare it with the state-of-the-art Support Vector Machine (SVM) classifier and get good results. As a web server, the source code and Yeast data sets used in this article are freely available at http://202.119.201.126:8888/DCTRF/.

摘要

蛋白质-蛋白质相互作用(PPIs)对大多数生物过程至关重要,并且在大多数细胞功能中发挥关键作用。随着高通量生物技术和计算机模拟方法的发展,已经为各种生物体生成了大量的PPIs数据,但许多问题仍未解决。这些因素推动了基于机器学习的计算机模拟方法的发展,以预测PPIs。在本研究中,我们提出了一种将集成旋转森林(RF)分类器和离散余弦变换(DCT)算法相结合的新方法,以预测蛋白质之间的相互作用。具体而言,将蛋白质氨基酸序列转换为包含生物进化信息的位置特异性得分矩阵(PSSM),然后使用DCT算法提取特征向量以呈现蛋白质进化信息;最后,使用集成旋转森林模型预测给定的蛋白质对是否相互作用。在酵母和幽门螺杆菌数据集上进行测试时,所提出的方法取得了优异的结果,平均准确率分别为98.54%和88.27%。此外,我们在独立数据集(秀丽隐杆线虫、大肠杆菌、智人和小家鼠)上分别取得了98.08%、92.75%、98.87%和98.72%的良好预测准确率。为了进一步评估我们方法的性能,我们将其与当前最先进的支持向量机(SVM)分类器进行比较,并取得了良好的结果。作为一个网络服务器,本文中使用的源代码和酵母数据集可在http://202.119.201.126:8888/DCTRF/上免费获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验