Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.
Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
Bioinformatics. 2018 Sep 1;34(17):i638-i646. doi: 10.1093/bioinformatics/bty600.
The complexes formed by binding of proteins to RNAs play key roles in many biological processes, such as splicing, gene expression regulation, translation and viral replication. Understanding protein-RNA binding may thus provide important insights to the functionality and dynamics of many cellular processes. This has sparked substantial interest in exploring protein-RNA binding experimentally, and predicting it computationally. The key computational challenge is to efficiently and accurately infer protein-RNA binding models that will enable prediction of novel protein-RNA interactions to additional transcripts of interest.
We developed DLPRB (Deep Learning for Protein-RNA Binding), a new deep neural network (DNN) approach for learning intrinsic protein-RNA binding preferences and predicting novel interactions. We present two different network architectures: a convolutional neural network (CNN), and a recurrent neural network (RNN). The novelty of our network hinges upon two key aspects: (i) the joint analysis of both RNA sequence and structure, which is represented as a probability vector of different RNA structural contexts; (ii) novel features in the architecture of the networks, such as the application of RNNs to RNA-binding prediction, and the combination of hundreds of variable-length filters in the CNN. Our results in inferring accurate RNA-binding models from high-throughput in vitro data exhibit substantial improvements, compared to all previous approaches for protein-RNA binding prediction (both DNN and non-DNN based). A more modest, yet statistically significant, improvement is achieved for in vivo binding prediction. When incorporating experimentally-measured RNA structure, compared to predicted one, the improvement on in vivo data increases. By visualizing the binding specificities, we can gain biological insights underlying the mechanism of protein RNA-binding.
The source code is publicly available at https://github.com/ilanbb/dlprb.
Supplementary data are available at Bioinformatics online.
蛋白质与 RNA 结合形成的复合物在许多生物过程中发挥着关键作用,例如剪接、基因表达调控、翻译和病毒复制。因此,了解蛋白质-RNA 结合可能为许多细胞过程的功能和动态提供重要的见解。这激发了人们对实验探索蛋白质-RNA 结合以及计算预测的极大兴趣。关键的计算挑战是有效地和准确地推断出蛋白质-RNA 结合模型,从而能够预测到对其他感兴趣的转录本的新的蛋白质-RNA 相互作用。
我们开发了 DLPRB(用于蛋白质-RNA 结合的深度学习),这是一种用于学习内在蛋白质-RNA 结合偏好并预测新相互作用的新的深度神经网络 (DNN) 方法。我们提出了两种不同的网络架构:卷积神经网络 (CNN) 和递归神经网络 (RNN)。我们的网络的新颖之处在于两个关键方面:(i) 对 RNA 序列和结构的联合分析,这表示为不同 RNA 结构环境的概率向量;(ii) 网络架构中的新特征,例如将 RNN 应用于 RNA 结合预测,以及在 CNN 中组合数百个可变长度滤波器。与所有以前的蛋白质-RNA 结合预测方法(基于 DNN 和非 DNN 的方法)相比,我们在从高通量体外数据推断准确的 RNA 结合模型方面取得了实质性的改进。在体内结合预测方面取得了更适度但具有统计学意义的改进。当将实验测量的 RNA 结构与预测的结构进行比较时,与体内数据相比,改进更为明显。通过可视化结合特异性,我们可以深入了解蛋白质 RNA 结合的机制。
源代码可在 https://github.com/ilanbb/dlprb 上获得。
补充数据可在 Bioinformatics 在线获得。