通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。

Predicting protein-protein interactions via multivariate mutual information of protein sequences.

作者信息

Ding Yijie, Tang Jijun, Guo Fei

机构信息

School of Computer Science and Technology, Tianjin University, No.135, Yaguan Road, Tianjin Haihe Education Park, Tianjin, People's Republic of China.

Department of Computer Science and Engineering, University of South Carolina, Columbia, USA.

出版信息

BMC Bioinformatics. 2016 Sep 27;17(1):398. doi: 10.1186/s12859-016-1253-9.

DOI:10.1186/s12859-016-1253-9

PMID:27677692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5039908/

Abstract

BACKGROUND

Protein-protein interactions (PPIs) are central to a lot of biological processes. Many algorithms and methods have been developed to predict PPIs and protein interaction networks. However, the application of most existing methods is limited since they are difficult to compute and rely on a large number of homologous proteins and interaction marks of protein partners. In this paper, we propose a novel sequence-based approach with multivariate mutual information (MMI) of protein feature representation, for predicting PPIs via Random Forest (RF).

METHODS

Our method constructs a 638-dimentional vector to represent each pair of proteins. First, we cluster twenty standard amino acids into seven function groups and transform protein sequences into encoding sequences. Then, we use a novel multivariate mutual information feature representation scheme, combined with normalized Moreau-Broto Autocorrelation, to extract features from protein sequence information. Finally, we feed the feature vectors into a Random Forest model to distinguish interaction pairs from non-interaction pairs.

RESULTS

To evaluate the performance of our new method, we conduct several comprehensive tests for predicting PPIs. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. Our method is applied to the S.cerevisiae PPIs dataset, and achieves 95.01 % accuracy and 92.67 % sensitivity repectively. For the H.pylori PPIs dataset, our method achieves 87.59 % accuracy and 86.81 % sensitivity respectively. In addition, we test our method on other three important PPIs networks: the one-core network, the multiple-core network, and the crossover network.

CONCLUSIONS

Compared to the Conjoint Triad method, accuracies of our method are increased by 6.25,2.06 and 18.75 %, respectively. Our proposed method is a useful tool for future proteomics studies.

摘要

背景

蛋白质-蛋白质相互作用（PPI）是许多生物过程的核心。已经开发了许多算法和方法来预测PPI和蛋白质相互作用网络。然而，大多数现有方法的应用受到限制，因为它们难以计算，并且依赖于大量同源蛋白质和蛋白质伙伴的相互作用标记。在本文中，我们提出了一种基于序列的新方法，该方法具有蛋白质特征表示的多变量互信息（MMI），用于通过随机森林（RF）预测PPI。

方法

我们的方法构建一个638维向量来表示每对蛋白质。首先，我们将二十种标准氨基酸聚类为七个功能组，并将蛋白质序列转换为编码序列。然后，我们使用一种新颖的多变量互信息特征表示方案，结合归一化的莫罗-布罗托自相关，从蛋白质序列信息中提取特征。最后，我们将特征向量输入到随机森林模型中，以区分相互作用对和非相互作用对。

结果

为了评估我们新方法的性能，我们进行了几个用于预测PPI的综合测试。实验表明，我们的方法在基于序列的PPI预测方面比其他优秀方法取得了更好的结果。我们的方法应用于酿酒酵母PPI数据集，分别达到了95.01%的准确率和92.67%的灵敏度。对于幽门螺杆菌PPI数据集，我们的方法分别达到了87.59%的准确率和86.81%的灵敏度。此外，我们在其他三个重要的PPI网络上测试了我们的方法：单核网络、多核网络和交叉网络。

结论

与联合三元组方法相比，我们方法的准确率分别提高了6.25%、2.06%和18.75%。我们提出的方法是未来蛋白质组学研究的一个有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07d2/5039908/2894b838d1b2/12859_2016_1253_Fig1_HTML.jpg

相似文献

Predicting protein-protein interactions via multivariate mutual information of protein sequences.通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2016 Sep 27;17(1):398. doi: 10.1186/s12859-016-1253-9.

Identification of Protein-Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information.通过一种基于矩阵且包含氨基酸接触信息的新型序列表示模型鉴定蛋白质-蛋白质相互作用。

Int J Mol Sci. 2016 Sep 24;17(10):1623. doi: 10.3390/ijms17101623.

Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition.通过结合连续小波描述符和伪氨基酸组成的加权稀疏表示模型改进蛋白质-蛋白质相互作用预测

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):120. doi: 10.1186/s12918-016-0360-6.

Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest.使用一种新颖的多尺度局部特征表示方案和随机森林从蛋白质一级序列预测蛋白质-蛋白质相互作用。

PLoS One. 2015 May 6;10(5):e0125811. doi: 10.1371/journal.pone.0125811. eCollection 2015.

Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences.基于氨基酸序列新型局部联合三联体描述符的蛋白质-蛋白质相互作用预测。

Int J Mol Sci. 2017 Nov 8;18(11):2373. doi: 10.3390/ijms18112373.

RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.RVMAB：使用相关向量机模型结合平均块从蛋白质序列预测蛋白质相互作用

Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757.

Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.利用蛋白质进化信息和相关向量机模型提高蛋白质-蛋白质相互作用预测准确性

Protein Sci. 2016 Oct;25(10):1825-33. doi: 10.1002/pro.2991. Epub 2016 Aug 9.

Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features.基于连续和离散小波变换特征的氨基酸序列蛋白质-蛋白质相互作用预测。

Molecules. 2018 Apr 4;23(4):823. doi: 10.3390/molecules23040823.

Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network.通过堆叠式稀疏自动编码器深度神经网络从蛋白质序列预测蛋白质-蛋白质相互作用。

Mol Biosyst. 2017 Jun 27;13(7):1336-1344. doi: 10.1039/c7mb00188f.

Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.通过集成学习方法预测人类与丙型肝炎病毒之间的蛋白质-蛋白质相互作用。

Mol Biosyst. 2014 Dec;10(12):3147-54. doi: 10.1039/c4mb00410h. Epub 2014 Sep 18.

引用本文的文献

Protein-protein interaction prediction using bidirectional GRUs with explicit ensemble.使用具有显式集成的双向门控循环单元进行蛋白质-蛋白质相互作用预测。

PLoS One. 2025 Jul 2;20(7):e0326960. doi: 10.1371/journal.pone.0326960. eCollection 2025.

Enhancing the Feature Representation of Protein Sequence Descriptors in Protein-Protein Interaction Prediction.在蛋白质-蛋白质相互作用预测中增强蛋白质序列描述符的特征表示

Interdiscip Sci. 2025 Jun 2. doi: 10.1007/s12539-025-00723-5.

RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.RecGOBD：通过多特征融合和注意力机制准确识别与基因本体相关的脑发育蛋白质功能。

Bioinform Adv. 2024 Nov 4;4(1):vbae163. doi: 10.1093/bioadv/vbae163. eCollection 2024.

Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments.计算方法在拥挤细胞环境中预测蛋白质-蛋白质相互作用。

Chem Rev. 2024 Apr 10;124(7):3932-3977. doi: 10.1021/acs.chemrev.3c00550. Epub 2024 Mar 27.

Cracking the black box of deep sequence-based protein-protein interaction prediction.破解基于深度序列的蛋白质-蛋白质相互作用预测的黑箱。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae076.

iCpG-Pos: an accurate computational approach for identification of CpG sites using positional features on single-cell whole genome sequence data.iCpG-Pos：一种使用单细胞全基因组序列数据上的位置特征来识别 CpG 位点的准确计算方法。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad474.

Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides.利用随机森林识别氨基酸的关键物理化学性质，以区分抗癌肽和非抗癌肽。

Int J Mol Sci. 2023 Jun 29;24(13):10854. doi: 10.3390/ijms241310854.

ProteinPrompt: a webserver for predicting protein-protein interactions.ProteinPrompt：一个用于预测蛋白质-蛋白质相互作用的网络服务器。

Bioinform Adv. 2022 Aug 17;2(1):vbac059. doi: 10.1093/bioadv/vbac059. eCollection 2022.

ADH-PPI: An attention-based deep hybrid model for protein-protein interaction prediction.ADH-PPI：一种用于蛋白质-蛋白质相互作用预测的基于注意力机制的深度混合模型。

iScience. 2022 Sep 21;25(10):105169. doi: 10.1016/j.isci.2022.105169. eCollection 2022 Oct 21.

Web-Based Protein Interactions Calculator Identifies Likely Proteome Coevolution with Alzheimer's Disease-Associated Proteins.基于网络的蛋白质相互作用计算器识别与阿尔茨海默病相关蛋白可能的蛋白质共进化。

Genes (Basel). 2022 Jul 27;13(8):1346. doi: 10.3390/genes13081346.

本文引用的文献

Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence.使用加权稀疏表示模型结合离散余弦变换从蛋白质序列预测蛋白质-蛋白质相互作用

Biomed Res Int. 2015;2015:902198. doi: 10.1155/2015/902198. Epub 2015 Oct 28.

PLoS One. 2015 May 6;10(5):e0125811. doi: 10.1371/journal.pone.0125811. eCollection 2015.

Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.利用新型多尺度连续和非连续特征集从氨基酸序列预测蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-15-S15-S9. Epub 2014 Dec 3.

Protein sequence classification with improved extreme learning machine algorithms.基于改进的极限学习机算法的蛋白质序列分类

Biomed Res Int. 2014;2014:103054. doi: 10.1155/2014/103054. Epub 2014 Mar 30.

Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.蛋白质家族的快速准确多变量高斯建模：预测残基接触和蛋白质相互作用伙伴。

PLoS One. 2014 Mar 24;9(3):e92721. doi: 10.1371/journal.pone.0092721. eCollection 2014.

Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.利用集成极限学习机和主成分分析从氨基酸序列预测蛋白质-蛋白质相互作用。

BMC Bioinformatics. 2013;14 Suppl 8(Suppl 8):S10. doi: 10.1186/1471-2105-14-S8-S10. Epub 2013 May 9.

Emerging methods in protein co-evolution.蛋白质共进化的新兴方法。

Nat Rev Genet. 2013 Apr;14(4):249-61. doi: 10.1038/nrg3414. Epub 2013 Mar 5.

Flaws in evaluation schemes for pair-input computational predictions.双输入计算预测评估方案中的缺陷。

Nat Methods. 2012 Dec;9(12):1134-6. doi: 10.1038/nmeth.2259.

Protein sequence classification using feature hashing.使用特征哈希进行蛋白质序列分类。

Proteome Sci. 2012 Jun 21;10 Suppl 1(Suppl 1):S14. doi: 10.1186/1477-5956-10-S1-S14.

Predicting protein associations with long noncoding RNAs.预测蛋白质与长链非编码RNA的关联。

Nat Methods. 2011 Jun;8(6):444-5. doi: 10.1038/nmeth.1611.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过蛋白质序列的多变量互信息预测蛋白质-蛋白质相互作用。

Predicting protein-protein interactions via multivariate mutual information of protein sequences.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献