利用蛋白质的一级结构预测不平衡数据中的蛋白质-蛋白质相互作用。

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

机构信息

Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 106, Taiwan

出版信息

BMC Bioinformatics. 2010 Apr 2;11:167. doi: 10.1186/1471-2105-11-167.

DOI:10.1186/1471-2105-11-167

PMID:20361868

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2868006/

Abstract

BACKGROUND

Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks.

RESULTS

This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors.

CONCLUSIONS

Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.

摘要

背景

阐明蛋白质-蛋白质相互作用（PPIs）对于构建蛋白质相互作用网络以及促进我们对生物系统一般原理的理解至关重要。先前的研究表明，可以通过其一级结构预测相互作用的蛋白质对。这些方法中的大多数在包含等量相互作用和非相互作用蛋白质对的数据集上都取得了令人满意的性能。然而，这种比例在自然界中高度不平衡，并且这些技术尚未针对现实数据集中大量非相互作用对的影响进行全面评估。此外，由于高度不平衡的分布通常会导致大数据集，因此在处理此类具有挑战性的任务时，需要更有效的预测器。

结果

本研究提出了一种仅基于序列信息的 PPI 预测方法，该方法在三个方面做出了贡献。首先，我们提出了一种基于概率的机制，可将蛋白质序列转换为特征向量。其次，所提出的预测器采用了一种有效的分类算法设计，该算法的效率对于处理高度不平衡的数据集至关重要。第三，使用不同的正-负比（从 1:1 到 1:15）的多个不平衡数据集评估了所提出的 PPI 预测器。该分析提供了确凿的证据，表明数据集的不平衡程度对于 PPI 预测器很重要。

结论

处理数据不平衡是 PPI 预测中的一个关键问题，因为相互作用的蛋白质对比非相互作用的蛋白质对要少得多。本文全面研究了这一问题，并开发了一种实用工具，该工具仅使用蛋白质序列信息即可实现良好的预测性能和效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abdd/2868006/34caf22eb322/1471-2105-11-167-1.jpg

相似文献

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

BMC Bioinformatics. 2010 Apr 2;11:167. doi: 10.1186/1471-2105-11-167.

Predicting the protein-protein interactions using primary structures with predicted protein surface.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-11-S1-S3.

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

IEEE Trans Nanobioscience. 2015 Oct;14(7):746-60. doi: 10.1109/TNB.2015.2475359. Epub 2015 Sep 28.

Predicting protein-protein interactions using high-quality non-interacting pairs.

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):525. doi: 10.1186/s12859-018-2525-3.

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S20. doi: 10.1186/1471-2164-12-S3-S20.

Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators.

BMC Bioinformatics. 2014 Jun 23;15:213. doi: 10.1186/1471-2105-15-213.

Prediction of Protein-Protein Interaction via co-occurring Aligned Pattern Clusters.

Methods. 2016 Nov 1;110:26-34. doi: 10.1016/j.ymeth.2016.07.018. Epub 2016 Jul 27.

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.

BMC Bioinformatics. 2010 Oct 29;11:537. doi: 10.1186/1471-2105-11-537.

Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-15-S15-S9. Epub 2014 Dec 3.

Predicting protein-protein interactions based only on sequences information.

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41. doi: 10.1073/pnas.0607879104. Epub 2007 Mar 5.

引用本文的文献

Unravelling the human taste receptor interactome: machine learning and molecular modelling insights into protein-protein interactions.

NPJ Sci Food. 2025 Jul 1;9(1):113. doi: 10.1038/s41538-025-00478-9.

Machine Learning to Predict Enzyme-Substrate Interactions in Elucidation of Synthesis Pathways: A Review.

Metabolites. 2024 Mar 7;14(3):154. doi: 10.3390/metabo14030154.

Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning.

Commun Biol. 2023 Jan 19;6(1):73. doi: 10.1038/s42003-023-04462-5.

DCSE:Double-Channel-Siamese-Ensemble model for protein protein interaction prediction.

BMC Genomics. 2022 Aug 4;23(1):555. doi: 10.1186/s12864-022-08772-6.

Prediction of Protein-Protein Interactions with Local Weight-Sharing Mechanism in Deep Learning.

Biomed Res Int. 2020 Jun 13;2020:5072520. doi: 10.1155/2020/5072520. eCollection 2020.

Machine-learning techniques for the prediction of protein-protein interactions.

J Biosci. 2019 Sep;44(4).

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features.

BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):246. doi: 10.1186/s12859-016-1100-z.

Fundamentals of protein interaction network mapping.

Mol Syst Biol. 2015 Dec 17;11(12):848. doi: 10.15252/msb.20156351.

PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction.

Int J Genomics. 2015;2015:608042. doi: 10.1155/2015/608042. Epub 2015 Oct 11.

MSCA: a spectral comparison algorithm between time series to identify protein-protein interactions.

BMC Bioinformatics. 2015 May 13;16:152. doi: 10.1186/s12859-015-0599-8.

本文引用的文献

Physical protein-protein interactions predicted from microarrays.

Bioinformatics. 2008 Nov 15;24(22):2608-14. doi: 10.1093/bioinformatics/btn498. Epub 2008 Oct 1.

Prediction of protein secondary structures with a novel kernel density estimation based classifier.

BMC Res Notes. 2008 Jul 23;1:51. doi: 10.1186/1756-0500-1-51.

Sequence-based prediction of protein-protein interactions by means of codon usage.

Genome Biol. 2008;9(5):R87. doi: 10.1186/gb-2008-9-5-r87. Epub 2008 May 23.

Estimating the size of the human interactome.

Proc Natl Acad Sci U S A. 2008 May 13;105(19):6959-64. doi: 10.1073/pnas.0708078105. Epub 2008 May 12.

Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences.

Nucleic Acids Res. 2008 May;36(9):3025-30. doi: 10.1093/nar/gkn159. Epub 2008 Apr 4.

The universal protein resource (UniProt).

Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.

Improving the performance of an SVM-based method for predicting protein-protein interactions.

In Silico Biol. 2006;6(6):515-29.

Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners.

PLoS Comput Biol. 2007 Apr 27;3(4):e43. doi: 10.1371/journal.pcbi.0030043.

Predicting protein-protein interactions based only on sequences information.

Proc Natl Acad Sci U S A. 2007 Mar 13;104(11):4337-41. doi: 10.1073/pnas.0607879104. Epub 2007 Mar 5.

How complete are current yeast and human protein-interaction networks?

Genome Biol. 2006;7(11):120. doi: 10.1186/gb-2006-7-11-120.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用蛋白质的一级结构预测不平衡数据中的蛋白质-蛋白质相互作用。

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献