通过支持向量机方法从一级序列预测RNA结合蛋白。

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.

作者信息

Han Lian Yi, Cai Cong Zhong, Lo Siew Lin, Chung Maxey C M, Chen Yu Zong

机构信息

Department of Computational Science, National University of Singapore, Singapore 117543.

出版信息

RNA. 2004 Mar;10(3):355-68. doi: 10.1261/rna.5890304.

DOI:10.1261/rna.5890304

PMID:14970381

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1370931/

Abstract

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.

摘要

阐明蛋白质与不同分子的相互作用对于理解细胞过程具有重要意义。已经开发了计算方法来预测蛋白质-蛋白质相互作用。但是，人们对蛋白质-RNA相互作用的预测关注不足，而蛋白质-RNA相互作用在调节基因表达和某些RNA介导的酶促过程中起着核心作用。这项工作探索了使用机器学习方法——支持向量机（SVM），直接从蛋白质的一级序列预测RNA结合蛋白。基于已知的RNA结合蛋白和非RNA结合蛋白的知识，训练了一个SVM系统来识别RNA结合蛋白。总共使用4011个RNA结合蛋白和9781个非RNA结合蛋白来训练和测试SVM分类系统，并使用一组独立的447个RNA结合蛋白和4881个非RNA结合蛋白来评估分类准确性。使用这个独立评估集的测试结果表明，对于rRNA结合蛋白、mRNA结合蛋白和tRNA结合蛋白，预测准确率分别为94.1%、79.3%和94.1%，对于非rRNA结合蛋白、非mRNA结合蛋白和非tRNA结合蛋白，预测准确率分别为98.7%、96.5%和99.9%。SVM分类系统在仅有60个可用序列的一小类snRNA结合蛋白上进行了进一步测试。对于snRNA结合蛋白和非snRNA结合蛋白，预测准确率分别为40.0%和99.9%，这表明需要足够数量的蛋白来训练SVM。在这项工作中训练的SVM分类系统已添加到我们基于网络的蛋白质功能分类软件SVMProt中，网址为http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi。我们的研究表明，SVM作为促进蛋白质-RNA相互作用预测的有用工具具有潜力。

相似文献

Prediction of RNA-binding proteins from primary sequence by a support vector machine approach.通过支持向量机方法从一级序列预测RNA结合蛋白。

RNA. 2004 Mar;10(3):355-68. doi: 10.1261/rna.5890304.

SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.SVM-Prot：基于网络的支持向量机软件，用于根据蛋白质一级序列进行功能分类。

Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600.

Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach.基于支持向量机方法，通过序列衍生的物理化学性质预测金属结合蛋白的功能类别。

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S13. doi: 10.1186/1471-2105-7-S5-S13.

Enzyme family classification by support vector machines.基于支持向量机的酶家族分类

Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045.

Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity.从序列衍生特性预测脂质结合蛋白的功能类别，而不考虑序列相似性。

J Lipid Res. 2006 Apr;47(4):824-31. doi: 10.1194/jlr.M500530-JLR200. Epub 2006 Jan 27.

Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity.一种不依赖序列相似性的统计学习方法对新型病毒蛋白功能类别的预测

Virology. 2005 Jan 5;331(1):136-43. doi: 10.1016/j.virol.2004.10.020.

Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines.利用支持向量机从一级结构预测核糖体RNA、RNA和DNA结合蛋白。

J Theor Biol. 2006 May 21;240(2):175-84. doi: 10.1016/j.jtbi.2005.09.018. Epub 2005 Nov 7.

Prediction of functional class of the SARS coronavirus proteins by a statistical learning method.用统计学习方法预测严重急性呼吸综合征冠状病毒蛋白的功能类别。

J Proteome Res. 2005 Sep-Oct;4(5):1855-62. doi: 10.1021/pr050110a.

Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.预测与序列相似性无关的新型酶的功能家族：一种统计学习方法。

Nucleic Acids Res. 2004 Dec 7;32(21):6437-44. doi: 10.1093/nar/gkh984. Print 2004.

SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。

J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.

引用本文的文献

Enhancing the Feature Representation of Protein Sequence Descriptors in Protein-Protein Interaction Prediction.在蛋白质-蛋白质相互作用预测中增强蛋白质序列描述符的特征表示

Interdiscip Sci. 2025 Jun 2. doi: 10.1007/s12539-025-00723-5.

iAMP-CRA: Identifying Antimicrobial Peptides Using Convolutional Recurrent Neural Network with Self-Attention.iAMP-CRA：使用带有自注意力机制的卷积循环神经网络识别抗菌肽

Health Inf Sci Syst. 2025 Mar 5;13(1):25. doi: 10.1007/s13755-025-00342-w. eCollection 2025 Dec.

ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.ProSol-multi：基于氨基酸多级相关性和判别性分布的蛋白质溶解度预测

Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15.

PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs.PMTPred：基于k间隔氨基酸对组成的蛋白质甲基转移酶的机器学习预测

Mol Divers. 2024 Aug;28(4):2301-2315. doi: 10.1007/s11030-024-10937-2. Epub 2024 Jul 21.

Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum.鉴定Cas9和Cas12蛋白中家族特异性特征：一种使用完整蛋白质特征谱的机器学习方法。

bioRxiv. 2024 Jan 23:2024.01.22.576286. doi: 10.1101/2024.01.22.576286.

In silico protein function prediction: the rise of machine learning-based approaches.计算机模拟蛋白质功能预测：基于机器学习方法的兴起

Med Rev (2021). 2023 Nov 29;3(6):487-510. doi: 10.1515/mr-2023-0038. eCollection 2023 Dec.

Multi-label classification and features investigation of antimicrobial peptides with various functional classes.具有不同功能类别的抗菌肽的多标签分类与特征研究

iScience. 2023 Oct 18;26(12):108250. doi: 10.1016/j.isci.2023.108250. eCollection 2023 Dec 15.

RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.RBP-TSTL 是一种用于 RNA 结合蛋白全基因组预测的两阶段迁移学习框架。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac215.

ILeukin10Pred: A Computational Approach for Predicting IL-10-Inducing Immunosuppressive Peptides Using Combinations of Amino Acid Global Features.白细胞介素10预测：一种利用氨基酸全局特征组合预测白细胞介素10诱导免疫抑制肽的计算方法。

Biology (Basel). 2021 Dec 21;11(1):5. doi: 10.3390/biology11010005.

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences.增强序列特征和亚细胞定位用于未知蛋白质序列的功能特征分析。

Med Biol Eng Comput. 2021 Nov;59(11-12):2297-2310. doi: 10.1007/s11517-021-02436-5. Epub 2021 Sep 20.

本文引用的文献

Fusion of face and speech data for person identity verification.用于身份验证的面部与语音数据融合

IEEE Trans Neural Netw. 1999;10(5):1065-74. doi: 10.1109/72.788647.

Support vector machines for spam categorization.用于垃圾邮件分类的支持向量机。

IEEE Trans Neural Netw. 1999;10(5):1048-54. doi: 10.1109/72.788645.

Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600.

Prediction of protein solvent accessibility using support vector machines.使用支持向量机预测蛋白质溶剂可及性。

Proteins. 2002 Aug 15;48(3):566-70. doi: 10.1002/prot.10176.

Divergent regulation of dihydrofolate reductase between malaria parasite and human host.疟原虫与人宿主之间二氢叶酸还原酶的不同调控

Science. 2002 Apr 19;296(5567):545-7. doi: 10.1126/science.1068274.

Support Vector Machines for predicting HIV protease cleavage sites in protein.用于预测蛋白质中HIV蛋白酶切割位点的支持向量机

J Comput Chem. 2002 Jan 30;23(2):267-74. doi: 10.1002/jcc.10017.

Screening with tumor markers: critical issues.肿瘤标志物筛查：关键问题

Mol Biotechnol. 2002 Feb;20(2):153-62. doi: 10.1385/MB:20:2:153.

RNA-protein interactions that regulate pre-mRNA splicing.调控前体信使核糖核酸剪接的核糖核酸-蛋白质相互作用。

Gene Expr. 2002;10(1-2):79-92.

Prediction of protein structural classes by support vector machines.利用支持向量机预测蛋白质结构类别。

Comput Chem. 2002 Feb;26(3):293-6. doi: 10.1016/s0097-8485(01)00113-9.

Classifying G-protein coupled receptors with support vector machines.使用支持向量机对G蛋白偶联受体进行分类。

Bioinformatics. 2002 Jan;18(1):147-59. doi: 10.1093/bioinformatics/18.1.147.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验