Suppr超能文献

基于组合特征的随机森林方法预测蛋白质-RNA 结合位点。

Prediction of protein-RNA binding sites by a random forest method with combined features.

机构信息

Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.

出版信息

Bioinformatics. 2010 Jul 1;26(13):1616-22. doi: 10.1093/bioinformatics/btq253. Epub 2010 May 18.

Abstract

MOTIVATION

Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide.

RESULTS

In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues.

AVAILABILITY

All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp

CONTACT

lnchen@sibs.ac.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质与 RNA 的相互作用在许多生物过程中起着关键作用,例如蛋白质合成、mRNA 加工、mRNA 组装、核糖体功能和真核剪接体。因此,可靠地识别蛋白质的 RNA 结合位点对于功能注释和定点突变非常重要。已积累的实验性蛋白质-RNA 相互作用数据表明,具有不同相邻氨基酸的 RNA 结合残基通常对其 RNA 伴侣表现出不同的偏好,而这反过来又可以通过评估氨基酸片段和 RNA 核苷酸之间的相互依赖性来评估。

结果

在这项工作中,我们通过结合一种新的相互作用特征(相互作用倾向)和其他序列和结构特征,提出了一种新的分类方法来识别蛋白质中的 RNA 结合位点。具体来说,相互作用倾向通过考虑蛋白质残基三肽中的两侧邻域,来表示蛋白质残基与相互作用的 RNA 核苷酸的结合特异性。将残基的序列和结构特征结合起来,以区分与 RNA 相互作用的氨基酸的相互作用倾向。我们通过实现一个精心构建的随机森林分类器来预测蛋白质中的 RNA 相互作用残基。实验表明,我们的方法能够以高精度检测注释的蛋白质-RNA 相互作用位点。我们的方法在包含 205 个非同源 RNA 结合蛋白的数据集上实现了 84.5%的 RNA 结合残基预测精度、0.85 的 F 度量和 0.92 的 AUC,并且在比较研究中优于 RNABindR、BindN、RNAProB 和 PPRint 等一些现有的 RNA 结合残基预测器,以及支持向量机、朴素贝叶斯和神经网络等一些替代机器学习方法。此外,我们通过评估特征对预测精度的贡献的重要性以及分析相互作用残基的结合模式,为蛋白质-RNA 相互作用中的序列和结构作用提供了一些生物学见解。

可用性

所有源数据和代码均可在 http://www.aporc.org/doc/wiki/PRNAhttp://www.sysbio.ac.cn/datatools.asp 获得。

联系方式

lnchen@sibs.ac.cn

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验