Miyake Hiroshi, Kawaguchi Risa Karakida, Kiryu Hisanori
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8561, Japan.
Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Sakyo-ku 606-8507, Japan.
Bioinform Adv. 2024 Sep 28;4(1):vbae144. doi: 10.1093/bioadv/vbae144. eCollection 2024.
RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.
RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.
The code is available at https://github.com/iyak/RNAelem.
RNA结合蛋白(RBPs)在RNA的转录后调控中起着至关重要的作用。鉴于其重要性,分析RBPs识别的特定RNA模式已成为生物信息学中的一个重要研究重点。深度神经网络提高了RBP结合位点预测的准确性,但由于其可解释性有限,从这些模型中理解RBP结合特异性的结构基础具有挑战性。为了解决这个问题,我们开发了RNAelem,它结合了轮廓上下文无关语法和RNA二级结构的特纳能量模型,以预测RBP结合区域中的序列-结构基序。
与现有的用于具有结构基序的RNA序列的工具相比,RNAelem表现出更高的检测准确性。将RNAelem应用于eCLIP数据库时,我们不仅能够在不存在二级结构的情况下重现许多已知的一级序列基序,还发现了许多包含序列非特异性插入区域的二级结构基序。此外,RNAelem的高可解释性产生了有深刻见解的发现,例如U2AF蛋白结合区域中的长程碱基配对相互作用。