Han Lian Yi, Cai Cong Zhong, Lo Siew Lin, Chung Maxey C M, Chen Yu Zong
Department of Computational Science, National University of Singapore, Singapore 117543.
RNA. 2004 Mar;10(3):355-68. doi: 10.1261/rna.5890304.
Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.
阐明蛋白质与不同分子的相互作用对于理解细胞过程具有重要意义。已经开发了计算方法来预测蛋白质-蛋白质相互作用。但是,人们对蛋白质-RNA相互作用的预测关注不足,而蛋白质-RNA相互作用在调节基因表达和某些RNA介导的酶促过程中起着核心作用。这项工作探索了使用机器学习方法——支持向量机(SVM),直接从蛋白质的一级序列预测RNA结合蛋白。基于已知的RNA结合蛋白和非RNA结合蛋白的知识,训练了一个SVM系统来识别RNA结合蛋白。总共使用4011个RNA结合蛋白和9781个非RNA结合蛋白来训练和测试SVM分类系统,并使用一组独立的447个RNA结合蛋白和4881个非RNA结合蛋白来评估分类准确性。使用这个独立评估集的测试结果表明,对于rRNA结合蛋白、mRNA结合蛋白和tRNA结合蛋白,预测准确率分别为94.1%、79.3%和94.1%,对于非rRNA结合蛋白、非mRNA结合蛋白和非tRNA结合蛋白,预测准确率分别为98.7%、96.5%和99.9%。SVM分类系统在仅有60个可用序列的一小类snRNA结合蛋白上进行了进一步测试。对于snRNA结合蛋白和非snRNA结合蛋白,预测准确率分别为40.0%和99.9%,这表明需要足够数量的蛋白来训练SVM。在这项工作中训练的SVM分类系统已添加到我们基于网络的蛋白质功能分类软件SVMProt中,网址为http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi。我们的研究表明,SVM作为促进蛋白质-RNA相互作用预测的有用工具具有潜力。