Yu Lezheng, Zhang Yonglin, Xue Li, Liu Fengjuan, Chen Qi, Luo Jiesi, Jing Runyu
School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China.
Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China.
Front Microbiol. 2022 Mar 15;13:843425. doi: 10.3389/fmicb.2022.843425. eCollection 2022.
DNA N-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (, , and ), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.
DNA N-甲基胞嘧啶(4mC)是一种关键的表观遗传修饰,在DNA复制、修复、表达和分化中起着至关重要的作用。为了深入了解4mC的生物学功能,在基因组学中识别其修饰位点至关重要。近年来,深度学习越来越受欢迎,并经常用于4mC位点识别。然而,目前仍缺乏对如何使用深度学习技术构建预测模型的系统分析。在这项工作中,我们首先总结了所有现有的基于深度学习的预测器,并系统地分析了它们的模型、特征和数据集等。然后,使用一个包含三个物种(、和)的典型标准数据集,我们评估了不同模型架构、编码方法和注意力机制在建立基于深度学习的4mC位点预测模型中的贡献。经过一系列优化,使用独热编码和注意力机制的卷积循环神经网络架构实现了最佳的整体预测性能。基于同一数据集进行了广泛的比较实验。这项工作将有助于未来希望使用深度学习构建4mC预测模型的研究人员。