Safaripour Razieh, June Lim Hyun Ja
Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, SK, Canada.
Health Informatics J. 2022 Apr-Jun;28(2):14604582221106396. doi: 10.1177/14604582221106396.
Emergency Department (ED) overcrowding is an emerging risk to patient safety. This study aims to assess and compare the predictive ability of machine learning (ML) models for predicting frequent ED users.
Korean Health Panel data from 2008 to 2015 were used for this study. Individuals with four or more visits per year were considered frequent ED users. Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM) as well as two ensemble models, namely Bagging and Voting, were trained and tested to examine their predictive performance.
The ML classification algorithms identified frequent ED users with high precision (90%-98%) and sensitivity (87%-91%), whereas LR showed fair precision (65%) and sensitivity (67%). The ML algorithms showed a high area under the curve (AUC) values from 89% for SVM to 96% for Random Forest, while LR showed the lowest AUC (65%). The classification error varied among algorithms; LR had the highest classification error (24.07%) while RF had the least (3.8%).
Results show that ML classification algorithms are robust techniques to predict frequent ED users, and the variables in administrative health panels are reliable indicators for this purpose.
急诊科过度拥挤是对患者安全新出现的风险。本研究旨在评估和比较机器学习(ML)模型预测急诊科频繁使用者的能力。
本研究使用了2008年至2015年的韩国健康面板数据。每年就诊四次或更多次的个体被视为急诊科频繁使用者。对逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)以及两种集成模型(即装袋法和投票法)进行了训练和测试,以检验它们的预测性能。
ML分类算法识别急诊科频繁使用者的精度较高(90%-98%),敏感度较高(87%-91%),而LR的精度一般(65%),敏感度一般(67%)。ML算法的曲线下面积(AUC)值较高,从支持向量机的89%到随机森林的96%,而LR的AUC最低(65%)。各算法的分类误差有所不同;LR的分类误差最高(24.07%),而随机森林的分类误差最小(3.8%)。
结果表明,ML分类算法是预测急诊科频繁使用者的可靠技术,行政健康面板中的变量是用于此目的的可靠指标。