Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan.
Faculty of Computer Science, Arab Open University, Muscat, Oman, Sultanate of Oman.
Sci Prog. 2024 Jul-Sep;107(3):368504241266588. doi: 10.1177/00368504241266588.
A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.
真核生物基因表达的一个关键阶段涉及到一种称为剪接体的蛋白质复合物对 mRNA 的剪接。这一步骤对产生和正确操作最终的基因产物有重要贡献。由于非编码内含子会破坏真核基因,因此剪接需要切除内含子并连接外显子,从而产生具有功能的 mRNA 分子。然而,使用各种分子生物学技术和其他生物学方法准确地找到剪接序列位点是复杂且耗时的。本文提出了一种精确可靠的基于深度学习的计算机辅助诊断 (CAD) 技术,用于快速准确地识别剪接序列。该基于深度学习的框架使用长短时记忆网络 (LSTM) 从 RNA 序列中提取独特的模式,从而实现快速准确的点突变序列映射。该网络使用独热编码来寻找有效的剪接位点识别序列模式。本文还对传统机器学习、一维卷积神经网络 (1D-CNN) 和递归神经网络 (RNN) 模型进行了详尽的消融研究。提出的 LSTM 网络优于现有的最先进方法,在接受体和供体位点数据集上的准确性分别提高了 3%和 2%。