Suppr超能文献

一种用于预测蛋白质-RNA结合残基的增强方法。

A boosting approach for prediction of protein-RNA binding residues.

作者信息

Tang Yongjun, Liu Diwei, Wang Zixiang, Wen Ting, Deng Lei

机构信息

Department of Clinical Pharmacology, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, 410008, China.

Institute of Clinical Pharmacology, Hunan Key Laboratory of Pharmacogenetics, Central South University, 87 Xiangya Road, Changsha, 410008, China.

出版信息

BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):465. doi: 10.1186/s12859-017-1879-2.

Abstract

BACKGROUND

RNA binding proteins play important roles in post-transcriptional RNA processing and transcriptional regulation. Distinguishing the RNA-binding residues in proteins is crucial for understanding how protein and RNA recognize each other and function together as a complex.

RESULTS

We propose PredRBR, an effectively computational approach to predict RNA-binding residues. PredRBR is built with gradient tree boosting and an optimal feature set selected from a large number of sequence and structure characteristics and two categories of structural neighborhood properties. In cross-validation experiments on the RBP170 data set show that PredRBR achieves an overall accuracy of 0.84, a sensitivity of 0.85, MCC of 0.55 and AUC of 0.92, which are significantly better than that of other widely used machine learning algorithms such as Support Vector Machine, Random Forest, and Adaboost. We further calculate the feature importance of different feature categories and find that structural neighborhood characteristics are critical in the recognization of RNA binding residues. Also, PredRBR yields significantly better prediction accuracy on an independent test set (RBP101) in comparison with other state-of-the-art methods.

CONCLUSIONS

The superior performance over existing RNA-binding residue prediction methods indicates the importance of the gradient tree boosting algorithm combined with the optimal selected features.

摘要

背景

RNA结合蛋白在转录后RNA加工和转录调控中发挥着重要作用。区分蛋白质中的RNA结合残基对于理解蛋白质与RNA如何相互识别并作为复合物共同发挥功能至关重要。

结果

我们提出了PredRBR,一种预测RNA结合残基的有效计算方法。PredRBR基于梯度树提升构建,并从大量序列和结构特征以及两类结构邻域属性中选择了最优特征集。在RBP170数据集上的交叉验证实验表明,PredRBR的总体准确率达到0.84,灵敏度为0.85,马修斯相关系数为0.55,曲线下面积为0.92,显著优于支持向量机、随机森林和Adaboost等其他广泛使用的机器学习算法。我们进一步计算了不同特征类别的特征重要性,发现结构邻域特征在识别RNA结合残基中至关重要。此外,与其他现有最先进方法相比,PredRBR在独立测试集(RBP101)上产生了显著更好的预测准确率。

结论

与现有RNA结合残基预测方法相比的卓越性能表明了梯度树提升算法与最优选择特征相结合的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f501/5773889/e7e168939875/12859_2017_1879_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验