Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea.
School of Software, Shandong University, Jinan 250101, China.
Cells. 2019 Oct 28;8(11):1332. doi: 10.3390/cells8111332.
DNA -methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5-5.9% and 3.2-11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8-5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
DNA -甲基胞嘧啶(4mC)是关键的表观遗传改变之一,在 DNA 复制、分化、细胞周期和基因表达中发挥着重要作用。为了更好地理解 4mC 的生物学功能,了解其基因组分布至关重要。最近,有一些计算研究,特别是机器学习(ML)方法被应用于预测 4mC 位点的预测。虽然基于 ML 的方法在预测其他物种中的 4mC 识别方面很有前景,但在检测小鼠基因组中的 4mC 方面还没有可用的方法。我们的新计算方法称为 4mCpred-EL,是第一个用于识别小鼠基因组中 4mC 位点的方法,其中使用了四种不同的 ML 算法和七种特征编码的广泛范围。随后,这些特征编码预测的概率值被用作特征向量,并再次输入到 ML 算法中,相应的模型被集成到集成学习中。我们的基准测试结果表明,4mCpred-EL 的准确率和 MCC 值分别为 0.795 和 0.591,比其他七种分类器分别高出 1.5-5.9%和 3.2-11.7%。此外,4mCpred-EL 的总体准确率为 79.80%,比其他七种分类器在独立评估中产生的准确率高出 1.8-5.1%。我们提供了一个用户友好的网络服务器,即 4mCpred-EL,可以作为识别小鼠基因组中潜在 4mC 位点的预筛选工具。