Akbari Rokn Abadi Saeedeh, Shahbakhsh Aref, Koohi Somayyeh
Division of Computational Science and Technology, School of Electrical Engineering and Computer Science (EECS), KTH Royal Institute of Technology, Stockholm, Sweden.
Computer Engineering Department, Sharif University of Technology, Tehran, Iran.
Sci Rep. 2025 May 28;15(1):18709. doi: 10.1038/s41598-025-03485-8.
The localization of mRNA is crucial for the synthesis of functional proteins and plays a significant role in cellular processes. Understanding mRNA localization can enhance applications in disease diagnosis (e.g., cancer, Alzheimer's) and drug development. While numerous methods have been developed for this purpose, existing approaches face challenges: experimental methods are often costly and time-consuming, while computational methods may lack accuracy and efficiency. To address these limitations, we propose LGLoc, a machine learning-based approach designed to improve the accuracy of mRNA localization predictions with low computational overhead. LGLoc employs a Graph Neural Network encoder that utilizes the RNA's secondary structure, complemented by a BERT encoder focused on the primary RNA sequence. Additionally, it integrates k-mer and nucleotide frequency-based encoders to capture essential sequence characteristics. Feature selection is conducted using an analysis of variance, and classification is performed through a one-vs-rest Naïve Bayes classifier tailored for mRNA classification. Our results indicate that LGLoc significantly outperforms existing methods, such as mRNALoc and MSLP, across key performance metrics including Accuracy, Sensitivity, Specificity, F1-score, AUC, and MCC. Notably, LGLoc achieves over 49% improvement in average F1-score and 26% in average MCC compared to mRNALoc, demonstrating its reliability and effectiveness in mRNA subcellular localization.
mRNA的定位对于功能性蛋白质的合成至关重要,并且在细胞过程中发挥着重要作用。了解mRNA定位可以增强在疾病诊断(如癌症、阿尔茨海默病)和药物开发中的应用。虽然已经为此目的开发了许多方法,但现有方法面临挑战:实验方法通常成本高昂且耗时,而计算方法可能缺乏准确性和效率。为了解决这些限制,我们提出了LGLoc,一种基于机器学习的方法,旨在以低计算开销提高mRNA定位预测的准确性。LGLoc采用了一个利用RNA二级结构的图神经网络编码器,并辅以一个专注于RNA一级序列的BERT编码器。此外,它整合了基于k-mer和核苷酸频率的编码器来捕获基本的序列特征。使用方差分析进行特征选择,并通过为mRNA分类量身定制的一对多朴素贝叶斯分类器进行分类。我们的结果表明,在包括准确率、灵敏度、特异性、F1分数、AUC和MCC在内的关键性能指标上,LGLoc显著优于现有方法,如mRNALoc和MSLP。值得注意的是,与mRNALoc相比,LGLoc的平均F1分数提高了49%以上,平均MCC提高了26%,证明了其在mRNA亚细胞定位中的可靠性和有效性。