University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States.
University of North Carolina Global Projects Zambia, Lusaka, Zambia.
PLoS One. 2019 Feb 27;14(2):e0198919. doi: 10.1371/journal.pone.0198919. eCollection 2019.
Globally, preterm birth is the leading cause of neonatal death with estimated prevalence and associated mortality highest in low- and middle-income countries (LMICs). Accurate identification of preterm infants is important at the individual level for appropriate clinical intervention as well as at the population level for informed policy decisions and resource allocation. As early prenatal ultrasound is commonly not available in these settings, gestational age (GA) is often estimated using newborn assessment at birth. This approach assumes last menstrual period to be unreliable and birthweight to be unable to distinguish preterm infants from those that are small for gestational age (SGA). We sought to leverage machine learning algorithms incorporating maternal factors associated with SGA to improve accuracy of preterm newborn identification in LMIC settings.
This study uses data from an ongoing obstetrical cohort in Lusaka, Zambia that uses early pregnancy ultrasound to estimate GA. Our intent was to identify the best set of parameters commonly available at delivery to correctly categorize births as either preterm (<37 weeks) or term, compared to GA assigned by early ultrasound as the gold standard. Trained midwives conducted a newborn assessment (<72 hours) and collected maternal and neonatal data at the time of delivery or shortly thereafter. New Ballard Score (NBS), last menstrual period (LMP), and birth weight were used individually to assign GA at delivery and categorize each birth as either preterm or term. Additionally, machine learning techniques incorporated combinations of these measures with several maternal and newborn characteristics associated with prematurity and SGA to develop GA at delivery and preterm birth prediction models. The distribution and accuracy of all models were compared to early ultrasound dating. Within our live-born cohort to date (n = 862), the median GA at delivery by early ultrasound was 39.4 weeks (IQR: 38.3-40.3). Among assessed newborns with complete data included in this analysis (n = 468), the median GA by ultrasound was 39.6 weeks (IQR: 38.4-40.3). Using machine learning, we identified a combination of six accessible parameters (LMP, birth weight, twin delivery, maternal height, hypertension in labor, and HIV serostatus) that can be used by machine learning to outperform current GA prediction methods. For preterm birth prediction, this combination of covariates correctly classified >94% of newborns and achieved an area under the curve (AUC) of 0.9796.
We identified a parsimonious list of variables that can be used by machine learning approaches to improve accuracy of preterm newborn identification. Our best-performing model included LMP, birth weight, twin delivery, HIV serostatus, and maternal factors associated with SGA. These variables are all easily collected at delivery, reducing the skill and time required by the frontline health worker to assess GA.
ClinicalTrials.gov Identifier: NCT02738892.
全球范围内,早产是导致新生儿死亡的主要原因,在低收入和中等收入国家(LMICs),早产的患病率和相关死亡率最高。准确识别早产儿对于个体层面的适当临床干预以及群体层面的知情政策决策和资源分配都非常重要。由于这些环境中通常无法进行早期产前超声检查,因此通常使用新生儿出生时的评估来估计胎龄(GA)。这种方法假设末次月经不可靠,而出生体重无法将早产儿与因胎龄小而生长受限(SGA)的婴儿区分开来。我们试图利用机器学习算法,结合与 SGA 相关的产妇因素,以提高 LMIC 环境中早产儿识别的准确性。
本研究使用了赞比亚卢萨卡正在进行的产科队列的数据,该队列使用早期妊娠超声来估计 GA。我们的目的是确定在分娩时最常用的一组参数,以将分娩正确分类为早产(<37 周)或足月,与早期超声作为金标准分配的 GA 进行比较。经过培训的助产士在新生儿出生后<72 小时进行评估,并在分娩时或之后不久收集产妇和新生儿数据。新 Ballard 评分(NBS)、末次月经(LMP)和出生体重分别用于在分娩时分配 GA,并将每个分娩分类为早产或足月。此外,机器学习技术将这些测量值与与早产和 SGA 相关的几种产妇和新生儿特征相结合,以制定分娩时的 GA 和早产预测模型。比较了所有模型的分布和准确性与早期超声检查结果。在我们迄今为止的活产队列中(n=862),早期超声检查的分娩时 GA 中位数为 39.4 周(IQR:38.3-40.3)。在本分析中包含的所有评估新生儿中(n=468),超声检查的 GA 中位数为 39.6 周(IQR:38.4-40.3)。通过机器学习,我们确定了一组六个可访问参数(LMP、出生体重、双胞胎分娩、产妇身高、分娩时高血压和 HIV 血清阳性)的组合,可以由机器学习使用来提高当前 GA 预测方法的性能。对于早产预测,这种协变量的组合可以正确分类>94%的新生儿,并达到 0.9796 的曲线下面积(AUC)。
我们确定了一组简洁的变量,可以通过机器学习方法提高早产儿识别的准确性。我们表现最好的模型包括 LMP、出生体重、双胞胎分娩、HIV 血清阳性和与 SGA 相关的产妇因素。这些变量都可以在分娩时轻松收集,减少了一线卫生工作者评估 GA 所需的技能和时间。
ClinicalTrials.gov 标识符:NCT02738892。