Department of Medical Informatics, Erasmus Medical Center, Rotterdam, The Netherlands.
Institute of Software Engineering, East China Normal University, Shanghai, China.
BMC Genomics. 2018 Jul 3;19(1):511. doi: 10.1186/s12864-018-4889-1.
RNA regulation is significantly dependent on its binding protein partner, known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized. Interdependencies between sequence and secondary structure specificities is challenging for both predicting RBP binding sites and accurate sequence and structure motifs detection.
In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, to enable subsequent convolution operations. To reveal the hidden binding knowledge from the observed sequences, the CNNs are applied to learn the abstract features. Considering the close relationship between sequence and predicted structures, we use the BLSTM to capture possible long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets. The results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage compared to other methods is that iDeepS can automatically extract both binding sequence and structure motifs, which will improve our understanding of the mechanisms of binding specificities of RBPs.
Our study shows that the iDeepS method identifies the sequence and structure motifs to accurately predict RBP binding sites. iDeepS is available at https://github.com/xypan1232/iDeepS .
RNA 的调控显著依赖于与其结合的蛋白质伴侣,即 RNA 结合蛋白(RBPs)。不幸的是,大多数 RBPs 的结合偏好仍未得到很好的描述。序列和二级结构特异性之间的相互依赖关系给预测 RBP 结合位点和准确检测序列和结构基序带来了挑战。
在这项研究中,我们提出了一种基于深度学习的方法 iDeepS,该方法使用卷积神经网络(CNN)和双向长短期记忆网络(BLSTM),从 RNA 序列中同时识别结合序列和结构基序。我们首先对序列和预测的二级结构进行独热编码,以进行后续的卷积操作。为了从观察到的序列中揭示隐藏的结合知识,CNN 用于学习抽象特征。考虑到序列和预测结构之间的密切关系,我们使用 BLSTM 来捕获由 CNN 识别的结合序列和结构基序之间可能的长程依赖关系。最后,学习到的加权表示被馈送到分类层以预测 RBP 结合位点。我们在来自大规模代表性 CLIP-seq 数据集的已验证 RBP 结合位点上评估了 iDeepS。结果表明,iDeepS 可以可靠地预测 RNA 上的 RBP 结合位点,并优于最先进的方法。与其他方法相比,iDeepS 的一个重要优势是它可以自动提取结合序列和结构基序,这将提高我们对 RBPs 结合特异性机制的理解。
我们的研究表明,iDeepS 方法可以识别序列和结构基序,从而准确预测 RBP 结合位点。iDeepS 可在 https://github.com/xypan1232/iDeepS 上获得。