RNALoc-LM：使用预训练RNA语言模型进行RNA亚细胞定位预测。

RNALoc-LM: RNA subcellular localization prediction using pre-trained RNA language model.

作者信息

Zeng Min, Zhang Xinyu, Li Yiming, Lu Chengqian, Yin Rui, Guo Fei, Li Min

机构信息

School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.

School of Computer Science, Key Laboratory of Intelligent Computing and Information Processing, Xiangtan University, Xiangtan, Hunan 411105, China.

出版信息

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf127.

DOI:10.1093/bioinformatics/btaf127

PMID:40119908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11978386/

Abstract

MOTIVATION

Accurately predicting RNA subcellular localization is crucial for understanding the cellular functions and regulatory mechanisms of RNAs. Although many computational methods have been developed to predict the subcellular localization of lncRNAs, miRNAs, and circRNAs, very few of them are designed to simultaneously predict the subcellular localization of multiple types of RNAs. In addition, the emergence of pre-trained RNA language model has shown remarkable performance in various bioinformatics tasks, such as structure prediction and functional annotation. Despite these advancements, there remains a significant gap in applying pre-trained RNA language models specifically for predicting RNA subcellular localization.

RESULTS

In this study, we proposed RNALoc-LM, the first interpretable deep-learning framework that leverages a pre-trained RNA language model for predicting RNA subcellular localization. RNALoc-LM uses a pre-trained RNA language model to encode RNA sequences, then captures local patterns and long-range dependencies through TextCNN and BiLSTM modules. A multi-head attention mechanism is used to focus on important regions within the RNA sequences. The results demonstrate that RNALoc-LM significantly outperforms both deep-learning baselines and existing state-of-the-art predictors. Additionally, motif analysis highlights RNALoc-LM's potential for discovering important motifs, while an ablation study confirms the effectiveness of the RNA sequence embeddings generated by the pre-trained RNA language model.

AVAILABILITY AND IMPLEMENTATION

The RNALoc-LM web server is available at http://csuligroup.com:8000/RNALoc-LM. The source code can be obtained from https://github.com/CSUBioGroup/RNALoc-LM.

摘要

动机

准确预测RNA的亚细胞定位对于理解RNA的细胞功能和调控机制至关重要。尽管已经开发了许多计算方法来预测长链非编码RNA（lncRNA）、微小RNA（miRNA）和环状RNA（circRNA）的亚细胞定位，但其中很少有设计用于同时预测多种类型RNA亚细胞定位的方法。此外，预训练RNA语言模型的出现已在各种生物信息学任务中表现出卓越性能，如结构预测和功能注释。尽管有这些进展，但在专门应用预训练RNA语言模型来预测RNA亚细胞定位方面仍存在显著差距。

结果

在本研究中，我们提出了RNALoc-LM，这是首个利用预训练RNA语言模型来预测RNA亚细胞定位的可解释深度学习框架。RNALoc-LM使用预训练RNA语言模型对RNA序列进行编码，然后通过TextCNN和双向长短期记忆网络（BiLSTM）模块捕获局部模式和长程依赖性。多头注意力机制用于聚焦RNA序列中的重要区域。结果表明，RNALoc-LM显著优于深度学习基线模型和现有的最先进预测器。此外，基序分析突出了RNALoc-LM发现重要基序的潜力，而消融研究证实了预训练RNA语言模型生成的RNA序列嵌入的有效性。