Horky Alex, Wasenitz Marita, Iacovella Carlotta, Bahlmann Franz, Al Naimi Ammar
Department of Obstetrics and Gynecology, Buergerhospital - Dr. Senckenberg Foundation, Nibelungenallee 37-41, 60318, Frankfurt, Hessen, Germany.
Department of Obstetrics and Prenatal Medicine, Goethe University, University Hospital of Frankfurt, Hessen, Germany.
Arch Gynecol Obstet. 2025 Apr 29. doi: 10.1007/s00404-025-08042-2.
The aim of this study is to investigate the differences in the accuracy of sonographic antenatal fetal weight estimation at term with artificial intelligence (AI) compared to that of clinical sonographers at different levels of experience.
This is a prospective cohort study where pregnant women at term scheduled for an imminent elective cesarean section were recruited. Three independent antenatal fetal weight estimations for each fetus were blindly measured by an experienced resident physician with level I qualification from the German Society for Ultrasound in Medicine (group 1), a senior physician with level II qualification (group 2), and an AI-supported algorithm (group 3) using Hadlock formula 3. The differences between the three groups and the actual birth weight were examined with a paired t-test. A variation within 10% of birth weight was deemed accurate, and the diagnostic accuracies of both groups 1 and 3 compared to group 2 were assessed using receiver operating characteristic (ROC) curves. The association between accuracy and potential influencing factors including gestational age, fetal position, maternal age, maternal body mass index (BMI), twins, neonatal gender, placental position, gestational diabetes, and amniotic fluid index was tested with univariate logistic regression. A sensitivity analysis by inflating the estimated weights by daily 25 grams (g) gain for days between examination and birth was conducted.
300 fetuses at a mean gestational week of 38.7 ± 1.1 were included in this study and examined on median 2 (2-4) days prior to delivery. Average birth weight was 3264.6 ± 530.7 g and the mean difference of the sonographic estimated fetal weight compared to birthweight was -203.6 ± 325.4 g, -132.2 ± 294.1 g, and -338.4 ± 606.2 g for groups 1, 2, and 3 respectively. The estimated weight was accurate in 62% (56.2%, 67.5%), 70% (64.5%, 75,1%), and 48.3% (42.6%, 54.1%) for groups 1, 2, and 3 respectively. The diagnostic accuracy measures for groups 1 and 3 compared to group 2 resulted in 55.7% (48.7%, 62.5%) and 68.6% (61.8%, 74.8%) sensitivity, 68.9% (58.3%, 78.2%) and 53.3% (42.5%, 63.9%) specificity and 0.62 (0.56, 0.68) and 0.61 (0.55, 0.67) area under the ROC curves respectively. There was no association between accuracy and the investigated variables. Adjusting for sensitivity analysis increased the accuracy to 68% (62.4%, 73.2%), 75% (69.7%, 79.8%), and 51.3% (45.5%, 57.1%), and changed the mean difference compared to birth weight to -136.1 ± 321.8 g, -64.7 ± 291.2 g, and -270.7 ± 605.2 g for groups 1, 2, and 3 respectively.
The antenatal weight estimation by experienced specialists with high-level qualifications remains the gold standard and provides the highest precision. Nevertheless, the accuracy of this standard is less than 80% even after adjusting for daily weight gain. The tested AI-supported method exhibits high variability and requires optimization and validation before being reliably used in clinical practice.
本研究旨在探讨与不同经验水平的临床超声医师相比,人工智能(AI)在足月妊娠时超声产前胎儿体重估计准确性方面的差异。
这是一项前瞻性队列研究,招募了计划进行择期剖宫产的足月孕妇。由德国医学超声学会认证的I级经验丰富的住院医师(第1组)、II级资格的高级医师(第2组)和使用哈德洛克公式3的AI支持算法(第3组)对每个胎儿进行三次独立的产前胎儿体重估计。采用配对t检验检查三组与实际出生体重之间的差异。出生体重±10%范围内的差异被视为准确,使用受试者工作特征(ROC)曲线评估第1组和第3组与第2组相比的诊断准确性。使用单因素逻辑回归检验准确性与潜在影响因素之间的关联,这些因素包括胎龄、胎儿位置、产妇年龄、产妇体重指数(BMI)、双胞胎、新生儿性别、胎盘位置、妊娠期糖尿病和羊水指数。进行敏感性分析,即在检查和出生之间的天数中,每天将估计体重增加25克(g)。
本研究纳入了300例平均孕周为38.7±1.1周的胎儿,并在分娩前中位数2(2 - 4)天进行检查。平均出生体重为3264.6±530.7克,第1组、第2组和第3组超声估计胎儿体重与出生体重的平均差异分别为 -203.6±325.4克、 -132.2±294.1克和 -338.4±606.2克。第1组、第2组和第3组估计体重准确的比例分别为62%(56.2%,67.5%)、70%(64.5%,75.1%)和48.3%(42.6%,54.1%)。第1组和第3组与第2组相比的诊断准确性指标分别为55.7%(48.7%,62.5%)和68.6%(61.8%,74.8%)的敏感性、68.9%(58.3%,78.2%)和53.3%(42.5%,63.9%)的特异性以及ROC曲线下面积分别为0.62(0.56,0.68)和0.61(0.55,0.67)。准确性与所研究变量之间无关联。调整敏感性分析后,第1组、第2组和第3组的准确性分别提高到68%(62.4%,73.2%)、75%(69.7%,79.8%)和51.3%(45.5%,57.1%),与出生体重相比的平均差异分别变为 -136.1±321.8克、 -64.7±291.2克和 -270.7±605.2克。
具有高级资格的经验丰富的专家进行的产前体重估计仍然是金标准,并且提供最高的精度。然而,即使在调整每日体重增加后,该标准的准确性仍低于80%。经过测试的AI支持方法表现出高度的变异性,在可靠地应用于临床实践之前需要进行优化和验证。