Goodwin L, Maher S, Ohno-Machado L, Iannacchione M A, Crockett P, Dreiseitl S, Vinterbo S, Hammond W
Duke University, Durham, NC, USA.
Proc AMIA Symp. 2000:305-9.
Data mining methods used a racially diverse sample (n = 19,970) of pregnant women and 1,622 variables that were collected in Duke's TMR electronic patient record over a 10-year period. Different statistical and data mining methods were similar when compared using receiver operating characteristic (ROC) curves. Best results found that seven demographic variables yielded .72 and addition of hundreds of other clinical variables added only .03 to the area under the curve (AUC). Similar results across methods suggest that results were data-driven and not method-dependent, and that demographic variables may offer a small set of parsimonious variables with predictive accuracy in a racially diverse population. Work to determine relevant variables for improved predictive accuracy is ongoing.
数据挖掘方法使用了来自不同种族的孕妇样本(n = 19970)以及在杜克大学TMR电子病历系统中10年期间收集的1622个变量。当使用受试者工作特征(ROC)曲线进行比较时,不同的统计和数据挖掘方法结果相似。最佳结果表明,七个人口统计学变量的曲线下面积(AUC)为0.72,而增加数百个其他临床变量仅使曲线下面积增加了0.03。不同方法得到的相似结果表明,结果是由数据驱动而非方法依赖的,并且在不同种族人群中,人口统计学变量可能提供了一组具有预测准确性的简洁变量。为提高预测准确性而确定相关变量的工作正在进行中。