School of Information Science and Technology, Northeast Normal University.
School of Artificial Intelligence, Jilin University.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa274.
Circular RNAs (circRNAs) are widely expressed in eukaryotes. The genome-wide interactions between circRNAs and RNA-binding proteins (RBPs) can be probed from cross-linking immunoprecipitation with sequencing data. Therefore, computational methods have been developed for identifying RBP binding sites on circRNAs. Unfortunately, those computational methods often suffer from the low discriminative power of feature representations, numerical instability and poor scalability. To address those limitations, we propose a novel computational method called iCircRBP-DHN using deep hierarchical network for discriminating circRNA-RBP binding sites. The network architecture can be regarded as a deep multi-scale residual network followed by bidirectional gated recurrent units (BiGRUs) with the self-attention mechanism, which can simultaneously extract local and global contextual information. Meanwhile, we propose novel encoding schemes by integrating CircRNA2Vec and the K-tuple nucleotide frequency pattern to represent different degrees of nucleotide dependencies. To validate the effectiveness of our proposed iCircRBP-DHN, we compared its performance with other computational methods on 37 circRNAs datasets and 31 linear RNAs datasets, respectively. The experimental results reveal that iCircRBP-DHN can achieve superior performance over those state-of-the-art algorithms. Moreover, we perform motif analysis on circRNAs bound by those different RBPs, demonstrating that our proposed CircRNA2Vec encoding scheme can be promising. The iCircRBP-DHN method is made available at https://github.com/houzl3416/iCircRBP-DHN.
环状 RNA(circRNAs)广泛存在于真核生物中。circRNAs 与 RNA 结合蛋白(RBPs)之间的全基因组相互作用可以通过交联免疫沉淀与测序数据来探测。因此,已经开发了用于识别 circRNA 上的 RBP 结合位点的计算方法。不幸的是,这些计算方法通常受到特征表示的低判别能力、数值不稳定性和较差的可扩展性的影响。为了解决这些限制,我们提出了一种名为 iCircRBP-DHN 的新计算方法,该方法使用深度层次网络来区分 circRNA-RBP 结合位点。该网络架构可以被视为一个深度多尺度残差网络,其后是具有自注意力机制的双向门控循环单元(BiGRUs),可以同时提取局部和全局上下文信息。同时,我们提出了新的编码方案,通过整合 CircRNA2Vec 和 K- 位核苷酸频率模式来表示不同程度的核苷酸依赖性。为了验证我们提出的 iCircRBP-DHN 的有效性,我们分别在 37 个 circRNAs 数据集和 31 个线性 RNA 数据集上与其他计算方法进行了比较。实验结果表明,iCircRBP-DHN 可以优于那些最先进的算法。此外,我们对不同 RBPs 结合的 circRNAs 进行了基序分析,表明我们提出的 CircRNA2Vec 编码方案是有前途的。iCircRBP-DHN 方法可在 https://github.com/houzl3416/iCircRBP-DHN 上获得。