Yi Hai-Cheng, You Zhu-Hong, Huang De-Shuang, Li Xiao, Jiang Tong-Hai, Li Li-Ping
Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
Mol Ther Nucleic Acids. 2018 Jun 1;11:337-344. doi: 10.1016/j.omtn.2018.03.001. Epub 2018 Mar 9.
The interactions between non-coding RNAs (ncRNAs) and proteins play an important role in many biological processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological techniques are used to identify protein molecules bound with specific ncRNA, but they are usually expensive and time consuming. Deep learning provides a powerful solution to computationally predict RNA-protein interactions. In this work, we propose the RPI-SAN model by using the deep-learning stacked auto-encoder network to mine the hidden high-level features from RNA and protein sequences and feed them into a random forest (RF) model to predict ncRNA binding proteins. Stacked assembling is further used to improve the accuracy of the proposed method. Four benchmark datasets, including RPI2241, RPI488, RPI1807, and NPInter v2.0, were employed for the unbiased evaluation of five established prediction tools: RPI-Pred, IPMiner, RPISeq-RF, lncPro, and RPI-SAN. The experimental results show that our RPI-SAN model achieves much better performance than other methods, with accuracies of 90.77%, 89.7%, 96.1%, and 99.33%, respectively. It is anticipated that RPI-SAN can be used as an effective computational tool for future biomedical researches and can accurately predict the potential ncRNA-protein interacted pairs, which provides reliable guidance for biological research.
非编码RNA(ncRNA)与蛋白质之间的相互作用在许多生物过程中发挥着重要作用,其生物学功能主要通过与多种蛋白质结合来实现。高通量生物技术用于识别与特定ncRNA结合的蛋白质分子,但这些技术通常昂贵且耗时。深度学习为计算预测RNA-蛋白质相互作用提供了强大的解决方案。在这项工作中,我们提出了RPI-SAN模型,通过使用深度学习堆叠自动编码器网络从RNA和蛋白质序列中挖掘隐藏的高级特征,并将其输入到随机森林(RF)模型中以预测ncRNA结合蛋白。进一步使用堆叠组装来提高所提方法的准确性。使用四个基准数据集,包括RPI2241、RPI488、RPI1807和NPInter v2.0,对五种已建立的预测工具进行无偏评估:RPI-Pred、IPMiner、RPISeq-RF、lncPro和RPI-SAN。实验结果表明,我们的RPI-SAN模型比其他方法具有更好的性能,准确率分别为90.77%、89.7%、96.1%和99.33%。预计RPI-SAN可作为未来生物医学研究的有效计算工具,能够准确预测潜在的ncRNA-蛋白质相互作用对,为生物学研究提供可靠指导。