Zhang Chuanwu, Garrard Lili, Keighley John, Carlson Susan, Gajewski Byron
Department of Biostatistics, University of Kansas Medical Center, Mail Stop 1026, 3901 Rainbow Blvd., Kansas City, KS, 66160, USA.
Division of Biometrics III, OB/OTS/CDER, U.S. Food and Drug Administration, Silver Spring, MD, 20993, USA.
BMC Pregnancy Childbirth. 2017 Jan 10;17(1):18. doi: 10.1186/s12884-016-1189-0.
Despite the widely recognized association between the severity of early preterm birth (ePTB) and its related severe diseases, little is known about the potential risk factors of ePTB and the sub-population with high risk of ePTB. Moreover, motivated by a future confirmatory clinical trial to identify whether supplementing pregnant women with docosahexaenoic acid (DHA) has a different effect on the risk subgroup population or not in terms of ePTB prevalence, this study aims to identify potential risk subgroups and risk factors for ePTB, defined as babies born less than 34 weeks of gestation.
The analysis data (N = 3,994,872) were obtained from CDC and NCHS' 2014 Natality public data file. The sample was split into independent training and validation cohorts for model generation and model assessment, respectively. Logistic regression and CART models were used to examine potential ePTB risk predictors and their interactions, including mothers' age, nativity, race, Hispanic origin, marital status, education, pre-pregnancy smoking status, pre-pregnancy BMI, pre-pregnancy diabetes status, pre-pregnancy hypertension status, previous preterm birth status, infertility treatment usage status, fertility enhancing drug usage status, and delivery payment source.
Both logistic regression models with either 14 or 10 ePTB risk factors produced the same C-index (0.646) based on the training cohort. The C-index of the logistic regression model based on 10 predictors was 0.645 for the validation cohort. Both C-indexes indicated a good discrimination and acceptable model fit. The CART model identified preterm birth history and race as the most important risk factors, and revealed that the subgroup with a preterm birth history and a race designation as Black had the highest risk for ePTB. The c-index and misclassification rate were 0.579 and 0.034 for the training cohort, and 0.578 and 0.034 for the validation cohort, respectively.
This study revealed 14 maternal characteristic variables that reliably identified risk for ePTB through either logistic regression model and/or a CART model. Moreover, both models efficiently identify risk subgroups for further enrichment clinical trial design.
尽管早产(ePTB)的严重程度与其相关严重疾病之间的关联已得到广泛认可,但对于ePTB的潜在风险因素以及ePTB高风险亚人群知之甚少。此外,鉴于未来有一项验证性临床试验,旨在确定孕妇补充二十二碳六烯酸(DHA)对ePTB患病率方面的风险亚组人群是否有不同影响,本研究旨在确定ePTB的潜在风险亚组和风险因素,ePTB定义为孕周小于34周出生的婴儿。
分析数据(N = 3,994,872)来自疾病控制与预防中心(CDC)和国家卫生统计中心(NCHS)2014年的出生公共数据文件。样本分别分为独立的训练队列和验证队列,用于模型生成和模型评估。采用逻辑回归和分类与回归树(CART)模型来检验潜在的ePTB风险预测因素及其相互作用,包括母亲的年龄、出生地、种族、西班牙裔血统、婚姻状况、教育程度、孕前吸烟状况、孕前体重指数(BMI)、孕前糖尿病状况、孕前高血压状况、既往早产史、不孕治疗使用情况、促孕药物使用情况以及分娩支付来源。
基于训练队列,包含14个或10个ePTB风险因素的逻辑回归模型均产生相同的C指数(0.646)。基于10个预测因素的逻辑回归模型在验证队列中的C指数为0.645。两个C指数均表明模型具有良好的区分度和可接受的拟合度。CART模型确定早产史和种族为最重要的风险因素,并显示有早产史且种族为黑人的亚组发生ePTB的风险最高。训练队列的c指数和误分类率分别为0.579和0.034,验证队列分别为0.578和0.034。
本研究揭示了14个母亲特征变量,这些变量通过逻辑回归模型和/或CART模型能够可靠地识别ePTB风险。此外,两种模型都能有效地识别风险亚组,以进一步完善临床试验设计。