Yuan Liangliang, Yang Yang
Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China.
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China.
Front Genet. 2021 Jan 22;11:632861. doi: 10.3389/fgene.2020.632861. eCollection 2020.
Circular RNAs (circRNAs), as a rising star in the RNA world, play important roles in various biological processes. Understanding the interactions between circRNAs and RNA binding proteins (RBPs) can help reveal the functions of circRNAs. For the past decade, the emergence of high-throughput experimental data, like CLIP-Seq, has made the computational identification of RNA-protein interactions (RPIs) possible based on machine learning methods. However, as the underlying mechanisms of RPIs have not been fully understood yet and the information sources of circRNAs are limited, the computational tools for predicting circRNA-RBP interactions have been very few. In this study, we propose a deep learning method to identify circRNA-RBP interactions, called DeCban, which is featured by hybrid double embeddings for representing RNA sequences and a cross-branch attention neural network for classification. To capture more information from RNA sequences, the double embeddings include pre-trained embedding vectors for both RNA segments and their converted amino acids. Meanwhile, the cross-branch attention network aims to address the learning of very long sequences by integrating features of different scales and focusing on important information. The experimental results on 37 benchmark datasets show that both double embeddings and the cross-branch attention model contribute to the improvement of performance. DeCban outperforms the mainstream deep learning-based methods on not only prediction accuracy but also computational efficiency. The data sets and source code of this study are freely available at: https://github.com/AaronYll/DECban.
环状RNA(circRNAs)作为RNA世界中的一颗新星,在各种生物过程中发挥着重要作用。了解circRNAs与RNA结合蛋白(RBPs)之间的相互作用有助于揭示circRNAs的功能。在过去十年中,高通量实验数据(如CLIP-Seq)的出现使得基于机器学习方法对RNA-蛋白质相互作用(RPIs)进行计算识别成为可能。然而,由于RPIs的潜在机制尚未完全了解,且circRNAs的信息来源有限,预测circRNA-RBP相互作用的计算工具非常少。在本研究中,我们提出了一种深度学习方法来识别circRNA-RBP相互作用,称为DeCban,其特点是采用混合双嵌入来表示RNA序列,并使用跨分支注意力神经网络进行分类。为了从RNA序列中捕获更多信息,双嵌入包括RNA片段及其转化氨基酸的预训练嵌入向量。同时,跨分支注意力网络旨在通过整合不同尺度的特征并关注重要信息来解决非常长序列的学习问题。在37个基准数据集上的实验结果表明,双嵌入和跨分支注意力模型都有助于性能的提升。DeCban不仅在预测准确性上,而且在计算效率上都优于主流的基于深度学习的方法。本研究的数据集和源代码可在以下网址免费获取:https://github.com/AaronYll/DECban。