Wei Jia, Zhou Jiandong, Zhang Zizheng, Yuan Kevin, Gu Qingze, Luk Augustine, Brent Andrew J, Clifton David A, Walker A Sarah, Eyre David W
Nuffield Department of Medicine, University of Oxford, Oxford, UK.
Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
Commun Med (Lond). 2024 Nov 18;4(1):236. doi: 10.1038/s43856-024-00673-x.
Accurately predicting hospital discharge events could help improve patient flow and the efficiency of healthcare delivery. However, using machine learning and diverse electronic health record (EHR) data for this task remains incompletely explored.
We used EHR data from February-2017 to January-2020 from Oxfordshire, UK to predict hospital discharges in the next 24 h. We fitted separate extreme gradient boosting models for elective and emergency admissions, trained on the first two years of data and tested on the final year of data. We examined individual-level and hospital-level model performance and evaluated the impact of training data size and recency, prediction time, and performance in subgroups.
Our models achieve AUROCs of 0.87 and 0.86, AUPRCs of 0.66 and 0.64, and F1 scores of 0.61 and 0.59 for elective and emergency admissions, respectively. These models outperform a logistic regression model using the same features and are substantially better than a baseline logistic regression model with more limited features. Notably, the relative performance increase from adding additional features is greater than the increase from using a sophisticated model. Aggregating individual probabilities, daily total discharge estimates are accurate with mean absolute errors of 8.9% (elective) and 4.9% (emergency). The most informative predictors include antibiotic prescriptions, medications, and hospital capacity factors. Performance remains robust across patient subgroups and different training strategies, but is lower in patients with longer admissions and those who died in hospital.
Our findings highlight the potential of machine learning in optimising hospital patient flow and facilitating patient care and recovery.
准确预测医院出院事件有助于改善患者流程和医疗服务效率。然而,利用机器学习和多样的电子健康记录(EHR)数据来完成这项任务仍未得到充分探索。
我们使用了来自英国牛津郡2017年2月至2020年1月的电子健康记录数据,以预测未来24小时内的医院出院情况。我们针对择期入院和急诊入院分别拟合了极端梯度提升模型,在前两年的数据上进行训练,并在最后一年的数据上进行测试。我们检查了个体层面和医院层面的模型性能,并评估了训练数据大小和时效性、预测时间以及亚组中的性能影响。
我们的模型在择期入院和急诊入院方面的受试者工作特征曲线下面积(AUROC)分别为0.87和0.86,精确率-召回率曲线下面积(AUPRC)分别为0.66和0.64,F1分数分别为0.61和0.59。这些模型优于使用相同特征的逻辑回归模型,并且比具有更有限特征的基线逻辑回归模型要好得多。值得注意的是,添加额外特征带来的相对性能提升大于使用复杂模型带来的提升。汇总个体概率后,每日出院总数估计准确,平均绝对误差分别为8.9%(择期)和4.9%(急诊)。最具信息量的预测因素包括抗生素处方、药物和医院容量因素。模型性能在不同患者亚组和不同训练策略中保持稳健,但在住院时间较长的患者和在医院死亡的患者中较低。
我们的研究结果凸显了机器学习在优化医院患者流程以及促进患者护理和康复方面的潜力。