Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, South Korea.
Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University, United Arab Emirates.
Comput Biol Med. 2024 Nov;182:109087. doi: 10.1016/j.compbiomed.2024.109087. Epub 2024 Sep 3.
Epigenetic modifications, particularly RNA methylation and histone alterations, play a crucial role in heredity, development, and disease. Among these, RNA 5-methylcytosine (m5C) is the most prevalent RNA modification in mammalian cells, essential for processes such as ribosome synthesis, translational fidelity, mRNA nuclear export, turnover, and translation. The increasing volume of nucleotide sequences has led to the development of machine learning-based predictors for m5C site prediction. However, these predictors often face challenges related to training data limitations and overfitting due to insufficient external validation. This study introduces m5C-Seq, an ensemble learning approach for RNA modification profiling, designed to address these issues. m5C-Seq employs a meta-classifier that integrates 15 probabilities generated from a novel, large dataset using systematic encoding methods to make final predictions. Demonstrating superior performance compared to existing predictors, m5C-Seq represents a significant advancement in accurate RNA modification profiling. The code and the newly established datasets are made available through GitHub at https://github.com/Z-Abbas/m5C-Seq.
表观遗传修饰,特别是 RNA 甲基化和组蛋白改变,在遗传、发育和疾病中起着至关重要的作用。在这些修饰中,RNA 5-甲基胞嘧啶(m5C)是哺乳动物细胞中最普遍的 RNA 修饰,对于核糖体合成、翻译保真度、mRNA 核输出、周转和翻译等过程至关重要。随着核苷酸序列数量的增加,基于机器学习的 m5C 位点预测器也得到了发展。然而,这些预测器经常面临与训练数据限制和过拟合相关的挑战,这是由于外部验证不足所致。本研究介绍了 m5C-Seq,这是一种用于 RNA 修饰分析的集成学习方法,旨在解决这些问题。m5C-Seq 使用元分类器,该分类器集成了 15 个概率,这些概率是使用系统编码方法从一个新的大型数据集生成的,以进行最终预测。与现有的预测器相比,m5C-Seq 表现出优越的性能,代表了准确的 RNA 修饰分析的重大进展。代码和新建立的数据集可通过 GitHub 获得:https://github.com/Z-Abbas/m5C-Seq。