LOCAS：基于监督对比学习的多标签mRNA定位

LOCAS: multilabel mRNA localization with supervised contrastive learning.

作者信息

Rahman Abir Abrar, Toki Tahmid Md, Saifur Rahman M

机构信息

Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf441.

DOI:10.1093/bib/bbaf441

PMID:40862519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12381764/

Abstract

The subcellular localization of messenger RNAs (mRNAs) plays a crucial role in gene regulation, ensuring precise spatial and temporal control of protein synthesis. Traditional computational approaches for mRNA localization have primarily relied on single-label classification models, which fail to capture the inherent multi-label nature of mRNA localization. Recent advancements have introduced deep learning-based multi-label prediction frameworks; however, existing methods often lack an effective way to model the relationships between multiple localizations. In this paper, we propose Localization with Supervised Contrastive Learning (LOCAS), a novel approach for multi-label mRNA subcellular localization prediction. LOCAS integrates an RNA language model (RiNALMo) to generate high-quality sequence embeddings and employs supervised contrastive learning (SCL) to refine the embedding space, ensuring biologically meaningful clustering of RNA sequences. To handle overlapping labels, we introduce an overlap-threshold-based similarity measure during contrastive training. Finally, we leverage an ML-Decoder, which utilizes a cross-attention mechanism to enhance multi-label classification performance. We evaluate LOCAS on two benchmark datasets, RNALocate and RNALocate V2.0, demonstrating state-of-the-art performance across all evaluation metrics. Extensive ablation studies validate the effectiveness of our approach, highlighting the contributions of contrastive learning and ML-decoder in improving multi-label classification. Our results suggest that integrating RNA sequence representation learning with SCL offers a powerful and scalable solution for mRNA localization prediction.

摘要

信使核糖核酸（mRNA）的亚细胞定位在基因调控中起着关键作用，可确保蛋白质合成在空间和时间上得到精确控制。传统的mRNA定位计算方法主要依赖单标签分类模型，无法捕捉mRNA定位固有的多标签性质。最近的进展引入了基于深度学习的多标签预测框架；然而，现有方法往往缺乏对多个定位之间关系进行建模的有效方式。在本文中，我们提出了基于监督对比学习的定位方法（LOCAS），这是一种用于多标签mRNA亚细胞定位预测的新方法。LOCAS整合了一个RNA语言模型（RiNALMo）以生成高质量的序列嵌入，并采用监督对比学习（SCL）来优化嵌入空间，确保RNA序列在生物学上有意义的聚类。为了处理重叠标签，我们在对比训练期间引入了基于重叠阈值的相似性度量。最后，我们利用一个ML解码器，它使用交叉注意力机制来提高多标签分类性能。我们在两个基准数据集RNALocate和RNALocate V2.0上评估了LOCAS，在所有评估指标上均展示了领先的性能。广泛的消融研究验证了我们方法的有效性，突出了对比学习和ML解码器在改进多标签分类方面的贡献。我们的结果表明，将RNA序列表示学习与SCL相结合为mRNA定位预测提供了一个强大且可扩展的解决方案。