一种用于预测蛋白质-RNA结合残基的增强方法。

A boosting approach for prediction of protein-RNA binding residues.

作者信息

Tang Yongjun, Liu Diwei, Wang Zixiang, Wen Ting, Deng Lei

机构信息

Department of Clinical Pharmacology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China.

Institute of Clinical Pharmacology, Hunan Key Laboratory of Pharmacogenetics, Central South University, 87 Xiangya Road, Changsha, 410008, China.

出版信息

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):465. doi: 10.1186/s12859-017-1879-2.

DOI:10.1186/s12859-017-1879-2

PMID:29219069

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5773889/

Abstract

BACKGROUND

RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex.

RESULTS

We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods.

CONCLUSIONS

The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.

摘要

背景

RNA结合蛋白在转录后RNA加工和转录调控中发挥着重要作用。区分蛋白质中的RNA结合残基对于理解蛋白质与RNA如何相互识别并作为复合物共同发挥功能至关重要。

结果

我们提出了PredRBR，一种预测RNA结合残基的有效计算方法。PredRBR基于梯度树提升构建，并从大量序列和结构特征以及两类结构邻域属性中选择了最优特征集。在RBP170数据集上的交叉验证实验表明，PredRBR的总体准确率达到0.84，灵敏度为0.85，马修斯相关系数为0.55，曲线下面积为0.92，显著优于支持向量机、随机森林和Adaboost等其他广泛使用的机器学习算法。我们进一步计算了不同特征类别的特征重要性，发现结构邻域特征在识别RNA结合残基中至关重要。此外，与其他现有最先进方法相比，PredRBR在独立测试集（RBP101）上产生了显著更好的预测准确率。

结论

与现有RNA结合残基预测方法相比的卓越性能表明了梯度树提升算法与最优选择特征相结合的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f501/5773889/e7e168939875/12859_2017_1879_Fig1_HTML.jpg

相似文献

A boosting approach for prediction of protein-RNA binding residues.

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):465. doi: 10.1186/s12859-017-1879-2.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.

Prediction of protein-RNA binding sites by a random forest method with combined features.

Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.

DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.

Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.

PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.

Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.

Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.

Identification of RNA-binding sites in proteins by integrating various sequence information.

Amino Acids. 2011 Jan;40(1):239-48. doi: 10.1007/s00726-010-0639-7. Epub 2010 Jun 12.

Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes.

Proteins. 2023 Sep;91(9):1361-1379. doi: 10.1002/prot.26528. Epub 2023 May 31.

Computational Prediction of RNA-Binding Proteins and Binding Sites.

Int J Mol Sci. 2015 Nov 3;16(11):26303-17. doi: 10.3390/ijms161125952.

引用本文的文献

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.

HybridRNAbind: prediction of RNA interacting residues across structure-annotated and disorder-annotated proteins.

Nucleic Acids Res. 2023 Mar 21;51(5):e25. doi: 10.1093/nar/gkac1253.

GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting.

Genomics Proteomics Bioinformatics. 2020 Oct;18(5):582-592. doi: 10.1016/j.gpb.2021.01.001. Epub 2021 Jan 27.

Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type.

Int J Mol Sci. 2020 Sep 19;21(18):6879. doi: 10.3390/ijms21186879.

Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model.

Front Bioeng Biotechnol. 2020 Aug 12;8:892. doi: 10.3389/fbioe.2020.00892. eCollection 2020.

Computational Detection of Breast Cancer Invasiveness with DNA Methylation Biomarkers.

Cells. 2020 Jan 30;9(2):326. doi: 10.3390/cells9020326.

PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity.

Sci Rep. 2020 Jan 28;10(1):1278. doi: 10.1038/s41598-020-57778-1.

MADOKA: an ultra-fast approach for large-scale protein structure similarity searching.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):662. doi: 10.1186/s12859-019-3235-1.

PredPRBA: Prediction of Protein-RNA Binding Affinity Using Gradient Boosted Regression Trees.

Front Genet. 2019 Aug 2;10:637. doi: 10.3389/fgene.2019.00637. eCollection 2019.

XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting.

Genes (Basel). 2019 Mar 21;10(3):242. doi: 10.3390/genes10030242.

本文引用的文献

A computational interactome and functional annotation for the human proteome.

Elife. 2016 Oct 22;5:e18715. doi: 10.7554/eLife.18715.

PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility.

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):8. doi: 10.1186/s12859-015-0851-2.

A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs.

PLoS Comput Biol. 2015 Dec 17;11(12):e1004639. doi: 10.1371/journal.pcbi.1004639. eCollection 2015 Dec.

An Integrated Framework for Functional Annotation of Protein Structural Domains.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):902-13. doi: 10.1109/TCBB.2015.2389213.

SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015.

Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score.

Nucleic Acids Res. 2015 Jun 23;43(11):5340-51. doi: 10.1093/nar/gkv446. Epub 2015 May 4.

A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues.

Brief Bioinform. 2016 Jan;17(1):88-105. doi: 10.1093/bib/bbv023. Epub 2015 May 1.

Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides.

Genomics. 2015 Apr;105(4):197-203. doi: 10.1016/j.ygeno.2015.01.005. Epub 2015 Jan 30.

PredHS: a web server for predicting protein-protein interaction hot spots by using structural neighborhood properties.

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W290-5. doi: 10.1093/nar/gku437. Epub 2014 May 22.

RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.

PLoS One. 2014 May 20;9(5):e97725. doi: 10.1371/journal.pone.0097725. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于预测蛋白质-RNA结合残基的增强方法。

A boosting approach for prediction of protein-RNA binding residues.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献