Rahman Abir Abrar, Toki Tahmid Md, Saifur Rahman M
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf441.
The subcellular localization of messenger RNAs (mRNAs) plays a crucial role in gene regulation, ensuring precise spatial and temporal control of protein synthesis. Traditional computational approaches for mRNA localization have primarily relied on single-label classification models, which fail to capture the inherent multi-label nature of mRNA localization. Recent advancements have introduced deep learning-based multi-label prediction frameworks; however, existing methods often lack an effective way to model the relationships between multiple localizations. In this paper, we propose Localization with Supervised Contrastive Learning (LOCAS), a novel approach for multi-label mRNA subcellular localization prediction. LOCAS integrates an RNA language model (RiNALMo) to generate high-quality sequence embeddings and employs supervised contrastive learning (SCL) to refine the embedding space, ensuring biologically meaningful clustering of RNA sequences. To handle overlapping labels, we introduce an overlap-threshold-based similarity measure during contrastive training. Finally, we leverage an ML-Decoder, which utilizes a cross-attention mechanism to enhance multi-label classification performance. We evaluate LOCAS on two benchmark datasets, RNALocate and RNALocate V2.0, demonstrating state-of-the-art performance across all evaluation metrics. Extensive ablation studies validate the effectiveness of our approach, highlighting the contributions of contrastive learning and ML-decoder in improving multi-label classification. Our results suggest that integrating RNA sequence representation learning with SCL offers a powerful and scalable solution for mRNA localization prediction.
信使核糖核酸(mRNA)的亚细胞定位在基因调控中起着关键作用,可确保蛋白质合成在空间和时间上得到精确控制。传统的mRNA定位计算方法主要依赖单标签分类模型,无法捕捉mRNA定位固有的多标签性质。最近的进展引入了基于深度学习的多标签预测框架;然而,现有方法往往缺乏对多个定位之间关系进行建模的有效方式。在本文中,我们提出了基于监督对比学习的定位方法(LOCAS),这是一种用于多标签mRNA亚细胞定位预测的新方法。LOCAS整合了一个RNA语言模型(RiNALMo)以生成高质量的序列嵌入,并采用监督对比学习(SCL)来优化嵌入空间,确保RNA序列在生物学上有意义的聚类。为了处理重叠标签,我们在对比训练期间引入了基于重叠阈值的相似性度量。最后,我们利用一个ML解码器,它使用交叉注意力机制来提高多标签分类性能。我们在两个基准数据集RNALocate和RNALocate V2.0上评估了LOCAS,在所有评估指标上均展示了领先的性能。广泛的消融研究验证了我们方法的有效性,突出了对比学习和ML解码器在改进多标签分类方面的贡献。我们的结果表明,将RNA序列表示学习与SCL相结合为mRNA定位预测提供了一个强大且可扩展的解决方案。