College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China.
PLoS Comput Biol. 2023 Aug 28;19(8):e1011370. doi: 10.1371/journal.pcbi.1011370. eCollection 2023 Aug.
DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
DNA 甲基化通过影响 DNA 的稳定性和改变染色体的结构,对基因表达的调控起着至关重要的作用。应识别 DNA 甲基化修饰位点,为深入了解其生物学功能奠定坚实基础。现有的基于机器学习的 DNA 甲基化预测方法尚未充分利用 DNA 基因序列中隐藏的多维信息,从而显著限制了模型的预测精度。此外,大多数模型都是基于单一的甲基化类型构建的。针对上述问题,本研究提出了一种基于深度学习的 DNA 甲基化位点预测方法,称为 MEDCNN 模型。MEDCNN 模型能够从基因序列中提取三维(即位置信息、生物信息和化学信息)的特征信息。此外,所提出的方法采用了具有双卷积层和双全连接层的卷积神经网络模型,同时使用交叉熵损失函数迭代更新梯度下降算法,以提高模型的预测精度。此外,MEDCNN 模型可以预测不同类型的 DNA 甲基化位点。实验结果表明,基于多维编码的深度学习方法优于单一编码方法,并且 MEDCNN 模型在预测不同物种之间的 DNA 甲基化方面具有高度的适用性和优于现有模型。综上所述,MEDCNN 模型可有效预测 DNA 甲基化位点。