Suppr超能文献

通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。

Predicting protein-protein interactions via multivariate mutual information of protein sequences.

作者信息

Ding Yijie, Tang Jijun, Guo Fei

机构信息

School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.

Department of Computer Science and Engineering, University of South Carolina, Columbia, USA.

出版信息

BMC Bioinformatics. 2016 Sep 27;17(1):398. doi: 10.1186/s12859-016-1253-9.

Abstract

BACKGROUND

Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF).

METHODS

Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs.

RESULTS

To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network.

CONCLUSIONS

Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.

摘要

背景

蛋白质-蛋白质相互作用(PPI)是许多生物过程的核心。已经开发了许多算法和方法来预测PPI和蛋白质相互作用网络。然而,大多数现有方法的应用受到限制,因为它们难以计算,并且依赖于大量同源蛋白质和蛋白质伙伴的相互作用标记。在本文中,我们提出了一种基于序列的新方法,该方法具有蛋白质特征表示的多变量互信息(MMI),用于通过随机森林(RF)预测PPI。

方法

我们的方法构建一个638维向量来表示每对蛋白质。首先,我们将二十种标准氨基酸聚类为七个功能组,并将蛋白质序列转换为编码序列。然后,我们使用一种新颖的多变量互信息特征表示方案,结合归一化的莫罗-布罗托自相关,从蛋白质序列信息中提取特征。最后,我们将特征向量输入到随机森林模型中,以区分相互作用对和非相互作用对。

结果

为了评估我们新方法的性能,我们进行了几个用于预测PPI的综合测试。实验表明,我们的方法在基于序列的PPI预测方面比其他优秀方法取得了更好的结果。我们的方法应用于酿酒酵母PPI数据集,分别达到了95.01%的准确率和92.67%的灵敏度。对于幽门螺杆菌PPI数据集,我们的方法分别达到了87.59%的准确率和86.81%的灵敏度。此外,我们在其他三个重要的PPI网络上测试了我们的方法:单核网络、多核网络和交叉网络。

结论

与联合三元组方法相比,我们方法的准确率分别提高了6.25%、2.06%和18.75%。我们提出的方法是未来蛋白质组学研究的一个有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07d2/5039908/2894b838d1b2/12859_2016_1253_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验