Yan Yuyao, Chai Xinyi, Liu Jiajun, Wang Sijia, Li Wenran, Huang Tao
CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
School of Life Sciences, Shanghai University, Shanghai, China.
BMC Bioinformatics. 2025 Apr 8;26(1):99. doi: 10.1186/s12859-025-06115-2.
Gene expression is the basis for cells to achieve various functions, while DNA methylation constitutes a critical epigenetic mechanism governing gene expression regulation. Here we propose DeepMethyGene, an adaptive recursive convolutional neural network model based on ResNet that predicts gene expression using DNA methylation information. Our model transforms methylation Beta values to M values for Gaussian distributed data optimization, dynamically adjusts the output channels according to input dimension, and implements residual blocks to mitigate the problem of gradient vanishing when training very deep networks. Benchmarking against the state-of-the-art geneEXPLORE model (R = 0.449), DeepMethyGene (R = 0.640) demonstrated superior predictive performance. Further analysis revealed that the number of methylation sites and the average distance between these sites and gene transcription start sites (TSS) significantly affected the prediction accuracy. By exploring the complex relationship between methylation and gene expression, this study provides theoretical support for disease progression prediction and clinical intervention. Relevant data and code are available at https://github.com/yaoyao-11/DeepMethyGene .
基因表达是细胞实现各种功能的基础,而DNA甲基化是一种关键的表观遗传机制,用于调控基因表达。在此,我们提出了DeepMethyGene,这是一种基于ResNet的自适应递归卷积神经网络模型,它利用DNA甲基化信息预测基因表达。我们的模型将甲基化β值转换为M值以优化高斯分布数据,根据输入维度动态调整输出通道,并实现残差块以缓解训练极深网络时的梯度消失问题。与最先进的geneEXPLORE模型(R = 0.449)相比,DeepMethyGene(R = 0.640)表现出卓越的预测性能。进一步分析表明,甲基化位点的数量以及这些位点与基因转录起始位点(TSS)之间的平均距离显著影响预测准确性。通过探索甲基化与基因表达之间的复杂关系,本研究为疾病进展预测和临床干预提供了理论支持。相关数据和代码可在https://github.com/yaoyao-11/DeepMethyGene获取。