广泛存在于细胞系和组织中的 m6A RNA 修饰的可解释预测模型。
Interpretable prediction models for widespread m6A RNA modification across cell lines and tissues.
机构信息
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.
出版信息
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad709.
MOTIVATION
RNA N6-methyladenosine (m6A) in Homo sapiens plays vital roles in a variety of biological functions. Precise identification of m6A modifications is thus essential to elucidation of their biological functions and underlying molecular-level mechanisms. Currently available high-throughput single-nucleotide-resolution m6A modification data considerably accelerated the identification of RNA modification sites through the development of data-driven computational methods. Nevertheless, existing methods have limitations in terms of the coverage of single-nucleotide-resolution cell lines and have poor capability in model interpretations, thereby having limited applicability.
RESULTS
In this study, we present CLSM6A, comprising a set of deep learning-based models designed for predicting single-nucleotide-resolution m6A RNA modification sites across eight different cell lines and three tissues. Extensive benchmarking experiments are conducted on well-curated datasets and accordingly, CLSM6A achieves superior performance than current state-of-the-art methods. Furthermore, CLSM6A is capable of interpreting the prediction decision-making process by excavating critical motifs activated by filters and pinpointing highly concerned positions in both forward and backward propagations. CLSM6A exhibits better portability on similar cross-cell line/tissue datasets, reveals a strong association between highly activated motifs and high-impact motifs, and demonstrates complementary attributes of different interpretation strategies.
AVAILABILITY AND IMPLEMENTATION
The webserver is available at http://csbio.njust.edu.cn/bioinf/clsm6a. The datasets and code are available at https://github.com/zhangying-njust/CLSM6A/.
动机
人类的 RNA N6-甲基腺苷(m6A)在各种生物功能中起着至关重要的作用。因此,精确识别 m6A 修饰对于阐明其生物学功能和潜在的分子水平机制至关重要。目前可用的高通量单核苷酸分辨率 m6A 修饰数据通过开发数据驱动的计算方法极大地加速了 RNA 修饰位点的识别。然而,现有的方法在单核苷酸分辨率细胞系的覆盖范围方面存在局限性,并且在模型解释方面能力较差,因此适用性有限。
结果
在这项研究中,我们提出了 CLSM6A,它由一组基于深度学习的模型组成,旨在预测跨越八种不同细胞系和三种组织的单核苷酸分辨率 m6A RNA 修饰位点。在精心整理的数据集上进行了广泛的基准测试实验,结果表明 CLSM6A 的性能优于当前最先进的方法。此外,CLSM6A 能够通过挖掘由过滤器激活的关键基序并确定正向和反向传播中高度关注的位置,来解释预测决策过程。CLSM6A 在类似的跨细胞系/组织数据集上具有更好的可移植性,揭示了高度激活的基序和高影响基序之间的强相关性,并展示了不同解释策略的互补属性。
可用性和实现
该网络服务器可在 http://csbio.njust.edu.cn/bioinf/clsm6a 访问。数据集和代码可在 https://github.com/zhangying-njust/CLSM6A/ 获得。