Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, K K Birla Goa campus, Zuarinagar, South Goa, Goa, India.
Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland.
Sci Rep. 2018 Jun 22;8(1):9552. doi: 10.1038/s41598-018-27814-2.
RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.
RNA 与蛋白质的相互作用(RPI)在调节各种生物过程中起着关键作用。RPI 的实验验证既耗时又费力,因此为计算预测方法铺平了道路。这些方法的主要限制因素一直是预测的准确性和置信度,我们的内部实验表明,它们无法准确预测涉及 TERRA RNA 等短 RNA 序列的 RPI。在这里,我们使用梯度提升分类器为 RPI 预测提供了一种数据驱动的模型。基于 RNA 蛋白复合物的高分辨率结构数据对氨基酸和核苷酸进行分类。使用由五个残基组成的最小结构单元作为描述符。与现有方法的比较分析表明,无论 RPI 中存在的 RNA 长度如何,我们的方法始终表现出更高的性能。该方法已成功应用于映射涉及长非编码 RNA 和 TERRA RNA 的 RPI 网络。该方法还成功预测了四个不同生物体的 RPI 网络中存在的 RNA 和蛋白质枢纽。该方法的稳健性将为预测未知的长非编码 RNA 和 miRNA 的 RPI 网络提供一种方法。