Department of Computer Science and Informatics, Oakland University, Rochester, MI, 48309, USA.
Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada.
Sci Rep. 2024 Oct 28;14(1):25688. doi: 10.1038/s41598-024-76148-9.
RNA 5-methyluridine (m5U) sites play a significant role in understanding RNA modifications, which influence numerous biological processes such as gene expression and cellular functioning. Consequently, the identification of m5U sites can play a vital role in the integrity, structure, and function of RNA molecules. Therefore, this study introduces GRUpred-m5U, a novel deep learning-based framework based on a gated recurrent unit in mature RNA and full transcript RNA datasets. We used three descriptor groups: nucleic acid composition, pseudo nucleic acid composition, and physicochemical properties, which include five feature extraction methods ENAC, Kmer, DPCP, DPCP type 2, and PseDNC. Initially, we aggregated all the feature extraction methods and created a new merged set. Three hybrid models were developed employing deep-learning methods and evaluated through 10-fold cross-validation with seven evaluation metrics. After a comprehensive evaluation, the GRUpred-m5U model outperformed the other applied models, obtaining 98.41% and 96.70% accuracy on the two datasets, respectively. To our knowledge, the proposed model outperformed all the existing state-of-the-art technology. The proposed supervised machine learning model was evaluated using unsupervised machine learning techniques such as principal component analysis (PCA), and it was observed that the proposed method provided a valid performance for identifying m5U. Considering its multi-layered construction, the GRUpred-m5U model has tremendous potential for future applications in the biological industry. The model, which consisted of neurons processing complicated input, excelled at pattern recognition and produced reliable results. Despite its greater size, the model obtained accurate results, essential in detecting m5U.
RNA 5-甲基尿嘧啶(m5U)位点在理解 RNA 修饰中起着重要作用,这些修饰影响着许多生物过程,如基因表达和细胞功能。因此,鉴定 m5U 位点对于 RNA 分子的完整性、结构和功能至关重要。因此,本研究引入了基于成熟 RNA 和全长转录本 RNA 数据集的门控循环单元的新型深度学习框架 GRUpred-m5U。我们使用了三个描述符组:核酸组成、伪核酸组成和理化性质,其中包括五种特征提取方法 ENAC、Kmer、DPCP、DPCP 类型 2 和 PseDNC。最初,我们将所有特征提取方法进行了汇总,并创建了一个新的合并集。我们采用了三种混合模型,使用深度学习方法进行开发,并通过 10 倍交叉验证和七种评估指标进行了评估。经过全面评估,GRUpred-m5U 模型的性能优于其他应用模型,在两个数据集上的准确率分别为 98.41%和 96.70%。据我们所知,该模型优于所有现有的最先进技术。我们使用无监督机器学习技术,如主成分分析(PCA),对所提出的监督机器学习模型进行了评估,观察到该方法在识别 m5U 方面提供了有效的性能。考虑到其多层结构,GRUpred-m5U 模型在未来的生物工业应用中具有巨大的潜力。该模型由处理复杂输入的神经元组成,擅长模式识别,并产生可靠的结果。尽管模型较大,但它获得了准确的结果,这对于检测 m5U 至关重要。