一种用于预测蛋白质-蛋白质相互作用位点的级联随机森林算法。

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

作者信息

Wei Zhi-Sen, Yang Jing-Yu, Shen Hong-Bin, Yu Dong-Jun

出版信息

IEEE Trans Nanobioscience. 2015 Oct;14(7):746-60. doi: 10.1109/TNB.2015.2475359. Epub 2015 Sep 28.

DOI:10.1109/TNB.2015.2475359

Abstract

Protein-protein interactions exist ubiquitously and play important roles in the life cycles of living cells. The interaction sites (residues) are essential to understanding the underlying mechanisms of protein-protein interactions. Previous research has demonstrated that the accurate identification of protein-protein interaction sites (PPIs) is helpful for developing new therapeutic drugs because many drugs will interact directly with those residues. Because of its significant potential in biological research and drug development, the prediction of PPIs has become an important topic in computational biology. However, a severe data imbalance exists in the PPIs prediction problem, where the number of the majority class samples (non-interacting residues) is far larger than that of the minority class samples (interacting residues). Thus, we developed a novel cascade random forests algorithm (CRF) to address the serious data imbalance that exists in the PPIs prediction problem. The proposed CRF resolves the negative effect of data imbalance by connecting multiple random forests in a cascade-like manner, each of which is trained with a balanced training subset that includes all minority samples and a subset of majority samples using an effective ensemble protocol. Based on the proposed CRF, we implemented a new sequence-based PPIs predictor, called CRF-PPI, which takes the combined features of position-specific scoring matrices, averaged cumulative hydropathy, and predicted relative solvent accessibility as model inputs. Benchmark experiments on both the cross validation and independent validation datasets demonstrated that the proposed CRF-PPI outperformed the state-of-the-art sequence-based PPIs predictors. The source code for CRF-PPI and the benchmark datasets are available online at http://csbio.njust.edu.cn/bioinf/CRF-PPI for free academic use.

摘要

蛋白质-蛋白质相互作用普遍存在，在活细胞的生命周期中发挥着重要作用。相互作用位点（残基）对于理解蛋白质-蛋白质相互作用的潜在机制至关重要。先前的研究表明，准确识别蛋白质-蛋白质相互作用位点（PPI）有助于开发新的治疗药物，因为许多药物会直接与这些残基相互作用。由于其在生物学研究和药物开发中的巨大潜力，PPI的预测已成为计算生物学中的一个重要课题。然而，PPI预测问题中存在严重的数据不平衡，其中多数类样本（非相互作用残基）的数量远大于少数类样本（相互作用残基）的数量。因此，我们开发了一种新颖的级联随机森林算法（CRF）来解决PPI预测问题中存在的严重数据不平衡。所提出的CRF通过以级联方式连接多个随机森林来解决数据不平衡的负面影响，每个随机森林都使用有效的集成协议，用包含所有少数样本和一部分多数样本的平衡训练子集进行训练。基于所提出的CRF，我们实现了一种新的基于序列的PPI预测器，称为CRF-PPI，它将位置特异性评分矩阵、平均累积亲水性和预测的相对溶剂可及性的组合特征作为模型输入。在交叉验证和独立验证数据集上的基准实验表明，所提出的CRF-PPI优于基于序列的最新PPI预测器。CRF-PPI的源代码和基准数据集可在http://csbio.njust.edu.cn/bioinf/CRF-PPI上在线获取，供学术免费使用。

相似文献

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

IEEE Trans Nanobioscience. 2015 Oct;14(7):746-60. doi: 10.1109/TNB.2015.2475359. Epub 2015 Sep 28.

Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

J Membr Biol. 2016 Apr;249(1-2):141-53. doi: 10.1007/s00232-015-9856-z. Epub 2015 Nov 12.

Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier.

J Theor Biol. 2014 May 7;348:47-54. doi: 10.1016/j.jtbi.2014.01.028. Epub 2014 Jan 31.

A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction.

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):901-912. doi: 10.1109/TCBB.2015.2505286. Epub 2015 Dec 3.

Predicted binding site information improves model ranking in protein docking using experimental and computer-generated target structures.

BMC Struct Biol. 2015 Nov 23;15:23. doi: 10.1186/s12900-015-0050-4.

A discriminative approach for identifying domain-domain interactions from protein-protein interactions.

Proteins. 2010 Apr;78(5):1243-53. doi: 10.1002/prot.22643.

Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1394-404. doi: 10.1109/TCBB.2015.2401018.

RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.

Int J Mol Sci. 2016 May 18;17(5):757. doi: 10.3390/ijms17050757.

Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set.

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S9. doi: 10.1186/1471-2105-15-S15-S9. Epub 2014 Dec 3.

Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22. doi: 10.1016/j.bbrc.2009.01.077. Epub 2009 Jan 24.

引用本文的文献

HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf079.

PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy.

Biomolecules. 2024 Sep 27;14(10):1220. doi: 10.3390/biom14101220.

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.

Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention.

BMC Bioinformatics. 2023 Dec 5;24(1):456. doi: 10.1186/s12859-023-05592-7.

Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning.

Commun Biol. 2023 Jan 19;6(1):73. doi: 10.1038/s42003-023-04462-5.

Recognition of Protein Network for Bioinformatics Knowledge Analysis Using Support Vector Machine.

Biomed Res Int. 2022 Apr 23;2022:2273648. doi: 10.1155/2022/2273648. eCollection 2022.

Exploring the computational methods for protein-ligand binding site prediction.

Comput Struct Biotechnol J. 2020 Feb 17;18:417-426. doi: 10.1016/j.csbj.2020.02.008. eCollection 2020.

Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets.

Int J Mol Sci. 2020 Jan 11;21(2):467. doi: 10.3390/ijms21020467.

SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences.

Bioinformatics. 2019 Jul 15;35(14):i343-i353. doi: 10.1093/bioinformatics/btz324.

Machine-learning techniques for the prediction of protein-protein interactions.

J Biosci. 2019 Sep;44(4).

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于预测蛋白质-蛋白质相互作用位点的级联随机森林算法。

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

作者信息

Wei Zhi-Sen, Yang Jing-Yu, Shen Hong-Bin, Yu Dong-Jun

出版信息

IEEE Trans Nanobioscience. 2015 Oct;14(7):746-60. doi: 10.1109/TNB.2015.2475359. Epub 2015 Sep 28.

DOI:10.1109/TNB.2015.2475359

PMID:26441427

Abstract

摘要

一种用于预测蛋白质-蛋白质相互作用位点的级联随机森林算法。

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种用于预测蛋白质-蛋白质相互作用位点的级联随机森林算法。

A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites.

作者信息

出版信息

相似文献

引用本文的文献