Suppr超能文献

RNA结合蛋白结构域的综合比较分析与鉴定:多类分类与特征选择

Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

作者信息

Jahandideh Samad, Srinivasasainagendra Vinodh, Zhi Degui

机构信息

Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA.

Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA.

出版信息

J Theor Biol. 2012 Nov 7;312:65-75. doi: 10.1016/j.jtbi.2012.07.013. Epub 2012 Aug 3.

Abstract

RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.

摘要

RNA与蛋白质的相互作用在各种细胞过程中发挥着重要作用,如蛋白质合成、基因调控、转录后基因调控、可变剪接以及RNA病毒感染。在本研究中,利用基因本体注释(GOA)和蛋白质结构分类(SCOP)数据库,设计了一种自动程序来捕获不同亚类中结构已解析的RNA结合蛋白结构域。随后,我们应用调谐多类支持向量机(TMCSVM)、随机森林(RF)和多类ℓ1/ℓq正则化逻辑回归(MCRLR),基于一组全面的序列和结构特征对RNA结合蛋白结构域进行分析和分类。在本研究中,我们比较了三种不同的最先进预测方法的预测准确性。从我们的结果来看,TMCSVM优于其他方法,表明TMCSVM作为促进RNA结合蛋白结构域多类预测的有用工具具有潜力。另一方面,MCRLR通过阐明特征对RNA结合蛋白结构域亚类预测准确性的贡献的重要性,帮助我们对蛋白质-RNA相互作用中序列和结构的作用提供一些生物学见解。

相似文献

6

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验