Kim Sujin, Kim Woojae, Park Rae Woong
College of Communication and Information Studies and Department of Pathology and Laboratory Medicine, University of Kentucky, Lexington, KY, USA.
Healthc Inform Res. 2011 Dec;17(4):232-43. doi: 10.4258/hir.2011.17.4.232. Epub 2011 Dec 31.
The intensive care environment generates a wealth of critical care data suited to developing a well-calibrated prediction tool. This study was done to develop an intensive care unit (ICU) mortality prediction model built on University of Kentucky Hospital (UKH)'s data and to assess whether the performance of various data mining techniques, such as the artificial neural network (ANN), support vector machine (SVM) and decision trees (DT), outperform the conventional logistic regression (LR) statistical model.
The models were built on ICU data collected regarding 38,474 admissions to the UKH between January 1998 and September 2007. The first 24 hours of the ICU admission data were used, including patient demographics, admission information, physiology data, chronic health items, and outcome information.
Only 15 study variables were identified as significant for inclusion in the model development. The DT algorithm slightly outperformed (AUC, 0.892) the other data mining techniques, followed by the ANN (AUC, 0.874), and SVM (AUC, 0.876), compared to that of the APACHE III performance (AUC, 0.871).
With fewer variables needed, the machine learning algorithms that we developed were proven to be as good as the conventional APACHE III prediction.
重症监护环境产生了大量适合开发精确校准预测工具的重症监护数据。本研究旨在基于肯塔基大学医院(UKH)的数据开发重症监护病房(ICU)死亡率预测模型,并评估各种数据挖掘技术,如人工神经网络(ANN)、支持向量机(SVM)和决策树(DT)的性能是否优于传统的逻辑回归(LR)统计模型。
这些模型基于1998年1月至2007年9月期间UKH收治的38474例患者的ICU数据构建。使用了ICU入院数据的前24小时,包括患者人口统计学信息、入院信息、生理数据、慢性健康项目和结局信息。
只有15个研究变量被确定为对模型开发有显著意义而纳入其中。与急性生理与慢性健康状况评分系统Ⅲ(APACHE III)的性能(曲线下面积[AUC],0.871)相比,决策树算法略优于其他数据挖掘技术(AUC,0.892),其次是人工神经网络(AUC,0.874)和支持向量机(AUC,0.876)。
我们开发的机器学习算法所需变量较少,已被证明与传统的APACHE III预测效果相当。