Teng Xiaojing, Liu Mengting, Wang Zhiyi, Dong Xueyan
Department of Laboratory Medicine, Affiliated Hangzhou First People's Hospital, School of Medicine, Westlake University, Hangzhou, Zhejiang, China.
The Fourth School of Clinical Medicine, Zhejiang Chinese Medical University (Hangzhou First People's Hospital), Hangzhou, China.
Sci Rep. 2025 Mar 25;15(1):10213. doi: 10.1038/s41598-025-92814-y.
Preterm birth (PTB), defined as delivery before 37 weeks, affects 15 million infants annually, accounting for 11% of live births and over 35% of neonatal deaths. While advanced maternal age (≥ 35 years) is a known risk factor, PTB risk in women under 35 is underexplored. This study aimed to develop a machine learning-based model for PTB prediction in women under 35. A retrospective cohort of 2606 cases (2019-2022) equally split between full-term and preterm births was analyzed. Logistic Regression, LightGBM, Gradient Boosting Decision Tree (GBDT), and XGBoost models were evaluated. External validation was conducted using 803 independent cases (2023). Model performance was assessed using area under the curve (AUC), accuracy, sensitivity, and specificity. SHAP (SHapley Additive exPlanations) values were used to interpret model predictions. The XGBoost model demonstrated superior performance with an AUC of 0.893 (95% CI: 0.860-0.925) on the validation set. In comparison, Logistic Regression, LightGBM, and GBDT achieved AUCs of 0.872, 0.840, and 0.879, respectively. External validation of the XGBoost model yielded an AUC of 0.91 (95% CI: 0.889-0.931). SHAP analysis highlighted seven key predictors: alkaline phosphatase (ALP), alpha-fetoprotein (AFP), hemoglobin (HGB), urea (UREA), lymphocyte count (Lym1), sodium (Na), and red cell distribution width coefficient of variation (RDWCV). The XGBoost model provides accurate PTB risk prediction and key insights for early intervention in women under 35, supporting its potential clinical utility.
早产(PTB)定义为妊娠37周前分娩,每年影响1500万婴儿,占活产婴儿的11%,新生儿死亡的35%以上。虽然高龄产妇(≥35岁)是已知的风险因素,但35岁以下女性的早产风险尚未得到充分研究。本研究旨在开发一种基于机器学习的模型,用于预测35岁以下女性的早产情况。分析了一个回顾性队列,包括2606例病例(2019 - 2022年),足月分娩和早产病例各占一半。对逻辑回归、LightGBM、梯度提升决策树(GBDT)和XGBoost模型进行了评估。使用803例独立病例(2023年)进行外部验证。使用曲线下面积(AUC)、准确率、敏感性和特异性评估模型性能。使用SHAP(SHapley加性解释)值来解释模型预测结果。XGBoost模型在验证集上表现出色,AUC为0.893(95%置信区间:0.860 - 0.925)。相比之下,逻辑回归、LightGBM和GBDT的AUC分别为0.872、0.840和0.879。XGBoost模型的外部验证AUC为0.91(95%置信区间:0.889 - 0.931)。SHAP分析突出了七个关键预测因素:碱性磷酸酶(ALP)、甲胎蛋白(AFP)、血红蛋白(HGB)、尿素(UREA)、淋巴细胞计数(Lym1)、钠(Na)和红细胞分布宽度变异系数(RDWCV)。XGBoost模型为35岁以下女性的早产风险提供了准确预测和早期干预的关键见解,支持其潜在的临床应用价值。