Center for Gene Diagnosis and Department of Clinical Laboratory Medicine, Zhongnan Hospital of Wuhan University, Donghu Road 169, Wuhan, 430071, China.
Department of Clinical Laboratory Medicine, Renmin Hospital of Wuhan University, Wuhan, 430060, China.
Clin Epigenetics. 2022 Sep 30;14(1):122. doi: 10.1186/s13148-022-01343-2.
DNA methylation-regulated genes have been demonstrated as the crucial participants in the occurrence of coronary heart disease (CHD). The machine learning based on DNA methylation-regulated genes has tremendous potential for mining non-invasive predictive biomarkers and exploring underlying new mechanisms of CHD.
First, the 2085 age-gender-matched individuals in Framingham Heart Study (FHS) were randomly divided into training set and validation set. We then integrated methylome and transcriptome data of peripheral blood leukocytes (PBLs) from the training set to probe into the methylation and expression patterns of CHD-related genes. A total of five hub DNA methylation-regulated genes were identified in CHD through dimensionality reduction, including ATG7, BACH2, CDKN1B, DHCR24 and MPO. Subsequently, methylation and expression features of the hub DNA methylation-regulated genes were used to construct machine learning models for CHD prediction by LightGBM, XGBoost and Random Forest. The optimal model established by LightGBM exhibited favorable predictive capacity, whose AUC, sensitivity, and specificity were 0.834, 0.672, 0.864 in the validation set, respectively. Furthermore, the methylation and expression statuses of the hub genes were verified in monocytes using methylation microarray and transcriptome sequencing. The methylation statuses of ATG7, DHCR24 and MPO and the expression statuses of ATG7, BACH2 and DHCR24 in monocytes of our study population were consistent with those in PBLs from FHS.
We identified five DNA methylation-regulated genes based on a predictive model for CHD using machine learning, which may clue the new epigenetic mechanism for CHD.
DNA 甲基化调控基因已被证明是冠心病(CHD)发生的关键参与者。基于 DNA 甲基化调控基因的机器学习在挖掘非侵入性预测生物标志物和探索 CHD 潜在新机制方面具有巨大潜力。
首先,我们将弗雷明汉心脏研究(FHS)中的 2085 名年龄性别匹配个体随机分为训练集和验证集。然后,我们整合了训练集中外周血白细胞(PBL)的甲基化组和转录组数据,以探究与 CHD 相关基因的甲基化和表达模式。通过降维,我们共鉴定了五个与 CHD 相关的枢纽 DNA 甲基化调控基因,包括 ATG7、BACH2、CDKN1B、DHCR24 和 MPO。随后,我们使用 LightGBM、XGBoost 和 Random Forest 构建了基于这些枢纽 DNA 甲基化调控基因的甲基化和表达特征的机器学习模型来预测 CHD。由 LightGBM 构建的最优模型表现出良好的预测能力,其在验证集中的 AUC、敏感性和特异性分别为 0.834、0.672 和 0.864。此外,我们使用甲基化微阵列和转录组测序在单核细胞中验证了枢纽基因的甲基化和表达状态。我们研究人群中单核细胞的 ATG7、DHCR24 和 MPO 的甲基化状态以及 ATG7、BACH2 和 DHCR24 的表达状态与 FHS 中 PBL 的状态一致。
我们使用机器学习基于 CHD 的预测模型鉴定了五个 DNA 甲基化调控基因,这可能为 CHD 的新表观遗传机制提供线索。