Suppr超能文献

基于 mRMR 和 IFS 的多特征融合和 RF 预测蛋白质-蛋白质相互作用位点。

Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS.

机构信息

School of Information Science and Technology, Northeast Normal University, Changchun, 130024 Jilin, China.

Graduate School, Northeast Normal University, Changchun 130024, Jilin, China.

出版信息

Dis Markers. 2022 Oct 4;2022:5892627. doi: 10.1155/2022/5892627. eCollection 2022.

Abstract

Prediction of protein-protein interaction (PPI) sites is one of the most perplexing problems in drug discovery and computational biology. Although significant progress has been made by combining different machine learning techniques with a variety of distinct characteristics, the problem still remains unresolved. In this study, a technique for PPI sites is presented using a random forest (RF) algorithm followed by the minimum redundancy maximal relevance (mRMR) approach, and the method of incremental feature selection (IFS). Physicochemical properties of proteins and the features of the residual disorder, sequence conservation, secondary structure, and solvent accessibility are incorporated. Five 3D structural characteristics are also used to predict PPI sites. Analysis of features shows that 3D structural features such as relative solvent-accessible surface area (RASA) and surface curvature (SC) help in the prediction of PPI sites. Results show that the performance of the proposed predictor is superior to several other state-of-the-art predictors, whose average prediction accuracy is 81.44%, sensitivity is 82.17%, and specificity is 80.71%, respectively. The proposed predictor is expected to become a helpful tool for finding PPI sites, and the feature analysis presented in this study will give useful insights into protein interaction mechanisms.

摘要

蛋白质-蛋白质相互作用(PPI)位点的预测是药物发现和计算生物学中最棘手的问题之一。尽管通过将不同的机器学习技术与各种不同的特征相结合已经取得了重大进展,但该问题仍然没有得到解决。在这项研究中,提出了一种使用随机森林(RF)算法结合最小冗余最大相关性(mRMR)方法和增量特征选择(IFS)方法的 PPI 位点预测技术。该方法结合了蛋白质的物理化学性质和残差无序、序列保守性、二级结构和溶剂可及性的特征。还使用了五个 3D 结构特征来预测 PPI 位点。特征分析表明,相对溶剂可及表面积(RASA)和表面曲率(SC)等 3D 结构特征有助于预测 PPI 位点。结果表明,所提出的预测器的性能优于其他几种最先进的预测器,其平均预测精度为 81.44%,灵敏度为 82.17%,特异性为 80.71%。预计该预测器将成为寻找 PPI 位点的有用工具,本研究中提出的特征分析将为蛋白质相互作用机制提供有用的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65dd/9553539/77e61169591c/DM2022-5892627.001.jpg

相似文献

1
Prediction of Protein-Protein Interaction Sites by Multifeature Fusion and RF with mRMR and IFS.
Dis Markers. 2022 Oct 4;2022:5892627. doi: 10.1155/2022/5892627. eCollection 2022.
2
Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS.
PLoS One. 2012;7(8):e43927. doi: 10.1371/journal.pone.0043927. Epub 2012 Aug 28.
3
Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.
Mol Biosyst. 2013 Jan 27;9(1):61-9. doi: 10.1039/c2mb25327e. Epub 2012 Nov 2.
4
Prediction of tyrosine sulfation with mRMR feature selection and analysis.
J Proteome Res. 2010 Dec 3;9(12):6490-7. doi: 10.1021/pr1007152. Epub 2010 Nov 11.
6
Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.
Comb Chem High Throughput Screen. 2017;20(2):96-106. doi: 10.2174/1386207319666161220114424.
7
Predicting DNA-binding sites of proteins based on sequential and 3D structural information.
Mol Genet Genomics. 2014 Jun;289(3):489-99. doi: 10.1007/s00438-014-0812-x. Epub 2014 Jan 22.
8
Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.
PLoS One. 2011;6(12):e28221. doi: 10.1371/journal.pone.0028221. Epub 2011 Dec 9.
9
Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches.
J Proteomics. 2012 Feb 16;75(5):1654-65. doi: 10.1016/j.jprot.2011.12.003. Epub 2011 Dec 11.
10
Prediction of protein domain with mRMR feature selection and analysis.
PLoS One. 2012;7(6):e39308. doi: 10.1371/journal.pone.0039308. Epub 2012 Jun 15.

本文引用的文献

1
MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.
Front Pharmacol. 2022 Mar 8;13:856417. doi: 10.3389/fphar.2022.856417. eCollection 2022.
2
CWLy-RF: A novel approach for identifying cell wall lyases based on random forest classifier.
Genomics. 2021 Sep;113(5):2919-2924. doi: 10.1016/j.ygeno.2021.06.038. Epub 2021 Jun 27.
3
Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.
4
How is structural divergence related to evolutionary information?
Mol Phylogenet Evol. 2018 Oct;127:859-866. doi: 10.1016/j.ympev.2018.06.033. Epub 2018 Jun 25.
5
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
PLoS Comput Biol. 2009 Jan;5(1):e1000278. doi: 10.1371/journal.pcbi.1000278. Epub 2009 Jan 30.
6
Prediction of protein-protein binding site by using core interface residue and support vector machine.
BMC Bioinformatics. 2008 Dec 22;9:553. doi: 10.1186/1471-2105-9-553.
7
Structural descriptor database: a new tool for sequence-based functional site prediction.
BMC Bioinformatics. 2008 Nov 25;9:492. doi: 10.1186/1471-2105-9-492.
8
How proteins get in touch: interface prediction in the study of biomolecular complexes.
Curr Protein Pept Sci. 2008 Aug;9(4):394-406. doi: 10.2174/138920308785132712.
9
ISIS: interaction sites identified from sequence.
Bioinformatics. 2007 Jan 15;23(2):e13-6. doi: 10.1093/bioinformatics/btl303.
10
Protein-protein interaction site prediction based on conditional random fields.
Bioinformatics. 2007 Mar 1;23(5):597-604. doi: 10.1093/bioinformatics/btl660. Epub 2007 Jan 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验