Sun Meijian, Wang Xia, Zou Chuanxin, He Zenghui, Liu Wei, Li Honglin
State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
BMC Bioinformatics. 2016 Jun 7;17(1):231. doi: 10.1186/s12859-016-1110-x.
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers.
In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631.
The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
RNA结合蛋白参与许多与RNA介导的基因调控相关的重要生物学过程,最近已开发出几种计算方法来预测RNA结合蛋白的蛋白质-RNA相互作用。新开发的判别描述符将有助于提高这些预测方法的预测准确性,并为研究人员提供更多有意义的信息。
在这项工作中,我们设计了两个结构特征(残基静电表面电位和三联体界面倾向),根据蛋白质-RNA复合物的统计和结构分析,这两个特征对于识别RNA结合蛋白残基很有效。利用这两个特征以及其他基于结构和序列的优秀特征,构建了一个随机森林分类器来预测RNA结合残基。我们的方法在训练集RBP195上进行五倍交叉验证时,受试者工作特征曲线下面积(AUC)为0.900,应用于测试集RBP68时,预测准确率(ACC)为0.868,F值为0.631。
我们方法良好的预测性能表明,新设计的两个描述符对于推断与RNA相互作用的蛋白质残基具有判别性。为便于使用我们的方法,构建了一个名为RNAProSite的网络服务器,它实现了所提出的方法,可在http://lilab.ecust.edu.cn/NABind免费获取。