Centre for Clinical Research, North Denmark Regional Hospital, Hjørring, Denmark; Business Intelligence and Analysis, The North Denmark Region, Denmark.
Department of Computer Science, Aalborg University, Aalborg, Denmark.
J Hosp Infect. 2024 Dec;154:112-121. doi: 10.1016/j.jhin.2023.03.017. Epub 2023 Mar 31.
Machine learning (ML) models for early identification of patients at risk of hospital-acquired urinary tract infection (HA-UTI) may enable timely and targeted preventive and therapeutic strategies. However, clinicians are often challenged in the interpretation of the predictive outcomes provided by the ML models, which often reach different performances.
To train ML models for predicting patients at risk of HA-UTI using available data from electronic health records at the time of hospital admission. This study focused on the performance of different ML models and clinical explainability.
This retrospective study investigated patient data representing 138,560 hospital admissions in the North Denmark Region from 1 January 2017 to 31 December 2018. Fifty-one health sociodemographic and clinical features were extracted as the full dataset, and χ test and expert knowledge were used for feature selection, resulting in two reduced datasets. Seven different ML models were trained and compared between the three datasets. The SHapley Additive exPlanation (SHAP) method was used to support population- and patient-level explainability.
The best-performing ML model was the neural network model based on the full dataset, with an area under the curve (AUC) of 0.758. The neural network model was also the best-performing ML model based on the reduced datasets, with an AUC of 0.746. Clinical explainability was demonstrated with a SHAP summary and forceplot.
Within 24 h of hospital admission, the ML models were able to identify patients at risk of developing HA-UTI, providing new opportunities to develop efficient strategies for the prevention of HA-UTI. SHAP was used to demonstrate how risk predictions can be explained at individual patient level and for the patient population in general.
机器学习 (ML) 模型可用于早期识别发生医院获得性尿路感染 (HA-UTI) 的高危患者,从而能够及时采取有针对性的预防和治疗策略。然而,临床医生通常难以理解 ML 模型提供的预测结果,因为这些结果的性能往往存在差异。
使用患者入院时电子健康记录中的可用数据来训练用于预测 HA-UTI 风险的 ML 模型。本研究侧重于不同 ML 模型的性能和临床可解释性。
这是一项回顾性研究,调查了来自 2017 年 1 月 1 日至 2018 年 12 月 31 日期间丹麦北部地区的 138560 例住院患者的数据。提取了 51 个健康社会人口统计学和临床特征作为完整数据集,并使用 χ2 检验和专家知识进行特征选择,得到两个简化数据集。在这三个数据集中,训练了 7 种不同的 ML 模型并进行了比较。使用 SHapley Additive exPlanation (SHAP) 方法来支持人群和个体水平的可解释性。
基于完整数据集的神经网络模型是表现最佳的 ML 模型,曲线下面积 (AUC) 为 0.758。基于简化数据集,神经网络模型也是表现最佳的 ML 模型,AUC 为 0.746。通过 SHAP 总结和力图展示了临床可解释性。
在入院后 24 小时内,ML 模型能够识别发生 HA-UTI 的高危患者,为开发高效的 HA-UTI 预防策略提供了新的机会。SHAP 用于展示如何在个体患者水平以及一般患者群体中解释风险预测。