Feng Yuxin, Ni Ying, Wang Wenkai, Guo Fen, Wang Liyu, Zhu Fan, Zhang Luyao, Feng Ying
Department of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou, Jiangsu Province, 215000, China.
Gusu School, Nanjing Medical University, Suzhou, Jiangsu Province, 215000, China.
BMC Pregnancy Childbirth. 2025 Jul 22;25(1):784. doi: 10.1186/s12884-025-07884-7.
Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth.
This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk.
Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score.
Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%.
DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment.
早产定义为妊娠37周前分娩,是新生儿发病和死亡的主要原因。CpG位点的DNA甲基化变化与早产风险相关。
本研究旨在识别脐带血中差异CpG位点,并基于这些甲基化变化开发预测性机器学习模型,以评估早产风险。
分析GSE110828数据集中110份新生儿脐带血样本的甲基化组数据,以识别早产和足月产之间不同的CpG位点(分别有88份用于训练,22份用于测试)。使用套索回归、弹性网络和随机森林选择关键CpG位点。构建了45个预测模型,并对其准确性、精确性、召回率和F1分数进行评估。
66个CpG位点在早产组和足月组之间存在显著差异。四个模型,包括带套索回归的随机森林模型和带随机森林的梯度提升模型,实现了最佳预测性能,每个模型的验证准确率均为93.75%。
脐带血中CpG位点的DNA甲基化变化与早产风险相关。基于CpG的甲基化模型显示出高预测准确性,有望用于早期临床风险评估。