Liu Ya, Liu Jiangling, Shen Heqing
State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory and State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, China.
Department of Obstetrics, Women and Children's Hospital, School of Medicine, Xiamen University, Xiamen, China.
Int J Gynaecol Obstet. 2025 Apr;169(1):332-340. doi: 10.1002/ijgo.16036. Epub 2024 Nov 18.
This study sought to develop a multifactorial predictive model for preterm birth risk, with the goal of providing clinical practitioners with early prevention.
This retrospective cohort study utilized 2022 and 2018 National Vital Statistics System (NVSS) birth data, with the 2022 cohort arbitrarily split into training (70%) and internal verification (30%) subsets, and the 2018 cohort for external validation. Four machine learning algorithms-logistic regression, adaptive lasso regression, bootstrap forest, and boosted trees-identified features associated with preterm birth. The study then integrated the consensus features identified across the four models to construct a logistic regression-based preterm birth prediction nomogram. To evaluate the model's efficacy, calibration, receiver operating characteristic (ROC), and decision curve analysis were applied to both the internal and external validation sets.
The study included 2 567 040 mother-infant pairs from the 2022 cohort and 2 688 568 mother-infant pairs from the 2018 cohort. All four machine learning models demonstrated high accuracy (area under the curve [AUC] >0.7) in predicting preterm birth, and the internal validation results indicated good model generalizability. Feature selection identified nine common risk factors associated with preterm birth. The prediction nomogram based on these nine common features achieved AUCs of 0.701, 0.702, and 0.704 in the training, internal validation, and external validation sets, respectively. The calibration curves showed good agreement, and the decision curve analysis confirmed the model's net clinical benefits.
This study developed a reliable preterm birth prediction tool using large-scale birth cohort data, filling the gap of lacking external validation for existing preterm birth prediction models.
本研究旨在开发一种用于早产风险的多因素预测模型,目标是为临床医生提供早期预防方法。
这项回顾性队列研究使用了2022年和2018年国家生命统计系统(NVSS)的出生数据,将2022年的队列任意分为训练子集(70%)和内部验证子集(30%),并将2018年的队列用于外部验证。四种机器学习算法——逻辑回归、自适应套索回归、自助森林和提升树——识别与早产相关的特征。然后,该研究整合了四个模型中确定的共识特征,以构建基于逻辑回归的早产预测列线图。为了评估模型的有效性,对内部和外部验证集均应用了校准、受试者工作特征(ROC)和决策曲线分析。
该研究纳入了2022年队列中的2567040对母婴和2018年队列中的2688568对母婴。所有四种机器学习模型在预测早产方面均显示出高准确性(曲线下面积[AUC]>0.7),内部验证结果表明模型具有良好的泛化性。特征选择确定了九个与早产相关的常见风险因素。基于这九个共同特征的预测列线图在训练集、内部验证集和外部验证集中的AUC分别为0.701、0.702和0.704。校准曲线显示出良好的一致性,决策曲线分析证实了该模型的净临床益处。
本研究利用大规模出生队列数据开发了一种可靠的早产预测工具,填补了现有早产预测模型缺乏外部验证的空白。