Ni Hongyan, Miao Jinli, Chen Jian
Department of maternity care, PingHu Maternal and Child Health Hospital, Jiaxing, Zhejiang, 314200, People's Republic of China.
The Yangtze River Delta Biological Medicine Research and Development Center of Zhejiang Province, Yangtze Delta Region Institution of Tsinghua University, Hangzhou, Zhejiang, 314006, People's Republic of China.
Int J Gen Med. 2025 Apr 26;18:2263-2274. doi: 10.2147/IJGM.S513064. eCollection 2025.
Gestational diabetes mellitus (GDM) poses serious health risks to both mothers and fetuses. However, effective tools for identifying GDM are lacking. This study, based on a Chinese cohort, aims to construct and compare the predictive performance of traditional logistic regression (LR) and six advanced machine learning (ML) models, thereby aiding in the early identification and intervention of GDM.
This retrospective study utilized medical examination data from 956 singleton pregnant women collected between January and December 2023 from ten maternal and child health hospitals in Pinghu City. We employed receiver operating characteristic curves and precision-recall curves to assess the predictive performance of the models. Decision curve analysis (DCA) was used to evaluate clinical utility, while calibration curves and Hosmer-Lemeshow (HL) tests were applied to assess the calibration of each model.
The 956 participants were randomly divided into a training set and a validation set at a 3:1 ratio. We identified 13 features through Spearman correlation analysis and the Boruta algorithm to construct the models. The LR model exhibited the best AUC at 0.787 (0.723-0.85), outperforming the seven other ML models including RF at 0.776 (0.711-0.841). Furthermore, the LR model showed good calibration and clinical utility.
Although ML has tremendous potential, in predicting the occurrence of GDM based on common early pregnancy data, the ML models did not completely outperform the traditional LR model. Simpler, traditional models may be more effective than complex ML approaches.
妊娠期糖尿病(GDM)对母亲和胎儿均构成严重的健康风险。然而,目前缺乏有效的GDM识别工具。本研究基于一个中国队列,旨在构建并比较传统逻辑回归(LR)模型和六种先进的机器学习(ML)模型的预测性能,从而有助于GDM的早期识别和干预。
这项回顾性研究利用了2023年1月至12月期间从平湖市十家妇幼保健院收集的956名单胎孕妇的体检数据。我们采用受试者工作特征曲线和精确召回率曲线来评估模型的预测性能。决策曲线分析(DCA)用于评估临床实用性,同时应用校准曲线和Hosmer-Lemeshow(HL)检验来评估每个模型的校准情况。
956名参与者按3:1的比例随机分为训练集和验证集。我们通过Spearman相关性分析和Boruta算法确定了13个特征来构建模型。LR模型的AUC最佳,为0.787(0.723 - 0.85)),优于包括随机森林(RF)在内的其他七个ML模型,RF的AUC为0.776(0.711 - 0.841)。此外,LR模型显示出良好的校准和临床实用性。
尽管机器学习有巨大潜力,但基于常见的早孕数据预测GDM的发生时,ML模型并未完全优于传统的LR模型。更简单的传统模型可能比复杂的ML方法更有效。