Wang Rulan, Chung Chia-Ru, Lee Tzong-Yi
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan.
Int J Mol Sci. 2024 Mar 1;25(5):2869. doi: 10.3390/ijms25052869.
RNA modification plays a crucial role in cellular regulation. However, traditional high-throughput sequencing methods for elucidating their functional mechanisms are time-consuming and labor-intensive, despite extensive research. Moreover, existing methods often limit their focus to specific species, neglecting the simultaneous exploration of RNA modifications across diverse species. Therefore, a versatile computational approach is necessary for interpretable analysis of RNA modifications across species. A multi-scale biological language-based deep learning model is proposed for interpretable, sequential-level prediction of diverse RNA modifications. Benchmark comparisons across species demonstrate the model's superiority in predicting various RNA methylation types over current state-of-the-art methods. The cross-species validation and attention weight visualization also highlight the model's capability to capture sequential and functional semantics from genomic backgrounds. Our analysis of RNA modifications helps us find the potential existence of "biological grammars" in each modification type, which could be effective for mapping methylation-related sequential patterns and understanding the underlying biological mechanisms of RNA modifications.
RNA修饰在细胞调控中起着至关重要的作用。然而,尽管进行了广泛的研究,但用于阐明其功能机制的传统高通量测序方法既耗时又费力。此外,现有方法往往将重点局限于特定物种,而忽视了对不同物种间RNA修饰的同时探索。因此,需要一种通用的计算方法来对跨物种的RNA修饰进行可解释分析。本文提出了一种基于多尺度生物语言的深度学习模型,用于对多种RNA修饰进行可解释的序列水平预测。跨物种的基准比较表明,该模型在预测各种RNA甲基化类型方面优于当前最先进的方法。跨物种验证和注意力权重可视化也突出了该模型从基因组背景中捕捉序列和功能语义的能力。我们对RNA修饰的分析有助于我们发现每种修饰类型中可能存在的“生物语法”,这对于绘制甲基化相关的序列模式以及理解RNA修饰的潜在生物学机制可能是有效的。