Suppr超能文献

通过支持向量机方法从一级序列预测RNA结合蛋白。

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.

作者信息

Han Lian Yi, Cai Cong Zhong, Lo Siew Lin, Chung Maxey C M, Chen Yu Zong

机构信息

Department of Computational Science, National University of Singapore, Singapore 117543.

出版信息

RNA. 2004 Mar;10(3):355-68. doi: 10.1261/rna.5890304.

Abstract

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.

摘要

阐明蛋白质与不同分子的相互作用对于理解细胞过程具有重要意义。已经开发了计算方法来预测蛋白质-蛋白质相互作用。但是,人们对蛋白质-RNA相互作用的预测关注不足,而蛋白质-RNA相互作用在调节基因表达和某些RNA介导的酶促过程中起着核心作用。这项工作探索了使用机器学习方法——支持向量机(SVM),直接从蛋白质的一级序列预测RNA结合蛋白。基于已知的RNA结合蛋白和非RNA结合蛋白的知识,训练了一个SVM系统来识别RNA结合蛋白。总共使用4011个RNA结合蛋白和9781个非RNA结合蛋白来训练和测试SVM分类系统,并使用一组独立的447个RNA结合蛋白和4881个非RNA结合蛋白来评估分类准确性。使用这个独立评估集的测试结果表明,对于rRNA结合蛋白、mRNA结合蛋白和tRNA结合蛋白,预测准确率分别为94.1%、79.3%和94.1%,对于非rRNA结合蛋白、非mRNA结合蛋白和非tRNA结合蛋白,预测准确率分别为98.7%、96.5%和99.9%。SVM分类系统在仅有60个可用序列的一小类snRNA结合蛋白上进行了进一步测试。对于snRNA结合蛋白和非snRNA结合蛋白,预测准确率分别为40.0%和99.9%,这表明需要足够数量的蛋白来训练SVM。在这项工作中训练的SVM分类系统已添加到我们基于网络的蛋白质功能分类软件SVMProt中,网址为http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi。我们的研究表明,SVM作为促进蛋白质-RNA相互作用预测的有用工具具有潜力。

相似文献

引用本文的文献

本文引用的文献

2
Support vector machines for spam categorization.用于垃圾邮件分类的支持向量机。
IEEE Trans Neural Netw. 1999;10(5):1048-54. doi: 10.1109/72.788645.
7
Screening with tumor markers: critical issues.肿瘤标志物筛查:关键问题
Mol Biotechnol. 2002 Feb;20(2):153-62. doi: 10.1385/MB:20:2:153.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验