Panwar Bharat, Raghava Gajendra P S
Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh, India.
Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh, India. Electronic address: http://www.imtech.res.in/raghava/
Genomics. 2015 Apr;105(4):197-203. doi: 10.1016/j.ygeno.2015.01.005. Epub 2015 Jan 30.
The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/).
RNA与蛋白质的相互作用在细胞中发挥着多种作用,因此识别RNA - 蛋白质界面对于生物学家理解它们的功能至关重要。过去,已经开发了几种方法来预测蛋白质中与RNA相互作用的残基,但在识别RNA中与蛋白质相互作用的核苷酸方面所做的努力有限。为了区分与蛋白质相互作用和不相互作用的核苷酸,我们使用了各种分类器(朴素贝叶斯、多项式朴素贝叶斯、贝叶斯网络、互补朴素贝叶斯、多层感知器、J48、SMO、随机森林、SMO和SVM(light)),利用各种特征来开发预测模型,基于SVM(light)的模型实现了最高83.92%的灵敏度、84.82%的特异性、84.62%的准确率和0.62的马修斯相关系数。我们观察到某些三核苷酸,如ACA、ACC、AGA、CAC、CCA、GAG、UGA和UUU在蛋白质相互作用中更受青睐。所有模型均使用非冗余数据集开发,并采用五折交叉验证技术进行评估。已为科学界开发了一个名为RNApin的网络服务器(http://crdd.osdd.net/raghava/rnapin/)。