Courtney Karen L, Stewart Sara, Popescu Mihail, Goodwin Linda K
School of Nursing, University of Pittsburgh, PA 15261 USA.
Stud Health Technol Inform. 2008;136:555-60.
Demographic factors have been shown to be moderate predictors of preterm birth in prior studies which used hospital databases and epidemiologic sample surveys. This retrospective study used de-identified 2003 North Carolina birth certificate data (n=73,040) and replicated the statistical and computational methods used in a prior study of an academic medical center's data warehouse. Receiver Operating Characteristics (ROC) curves were used to compare results across methods. Due to differences between the data collected for birth certificates and the original clinical database, five of the seven demographic variables in the clinical database model were available for model testing (maternal age, marital status, race/ethnicity, education and county). Even with a reduced model, multiple methods of statistical and computational modeling supported the earlier findings of demographic predictors for preterm birth. The reduced model AUC results were acceptable (logistic regression = 0.605, neural networks = 0.57, SVM = 0.57, Bayesian classifiers = 0.59, and CART = 0.56), but lower than in the prior study as might be expected for a reduced model. On a population level, these results support a prior demographic predictor preterm birth model generated from a clinical database and the use of computational methods for model formation. Additional testing for stronger predictor models within birth certificate data is suggested as birth certificate data is a parsimonious population dataset already routinely collected.
在先前使用医院数据库和流行病学样本调查的研究中,人口统计学因素已被证明是早产的中度预测指标。这项回顾性研究使用了经过身份识别处理的2003年北卡罗来纳州出生证明数据(n = 73,040),并复制了先前一项针对学术医疗中心数据仓库研究中所使用的统计和计算方法。采用受试者工作特征(ROC)曲线来比较不同方法的结果。由于出生证明所收集的数据与原始临床数据库之间存在差异,临床数据库模型中的七个人口统计学变量中有五个可用于模型测试(产妇年龄、婚姻状况、种族/族裔、教育程度和所在县)。即使模型有所简化,多种统计和计算建模方法仍支持了先前关于早产人口统计学预测指标的研究结果。简化模型的AUC结果是可以接受的(逻辑回归 = 0.605,神经网络 = 0.57,支持向量机 = 0.57,贝叶斯分类器 = 0.59,分类与回归树 = 0.56),但低于先前的研究,这对于简化模型来说是预期之中的。在总体层面上,这些结果支持了先前从临床数据库生成的早产人口统计学预测模型以及使用计算方法来构建模型。鉴于出生证明数据是一个已经常规收集的简约总体数据集,建议对出生证明数据中更强的预测模型进行额外测试。