Zeleke Addisu Jember, Palumbo Pierpaolo, Tubertini Paolo, Miglio Rossella, Chiari Lorenzo
Department of Electrical, Electronic, and Information Engineering Guglielmo Marconi, University of Bologna, Bologna, Italy.
Enterprise Information Systems for Integrated Care and Research Data Management, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy.
Front Artif Intell. 2023 Jul 28;6:1179226. doi: 10.3389/frai.2023.1179226. eCollection 2023.
This study aims to develop and compare different models to predict the Length of Stay (LoS) and the Prolonged Length of Stay (PLoS) of inpatients admitted through the emergency department (ED) in general patient settings. This aim is not only to promote any specific model but rather to suggest a decision-supporting tool (i.e., a prediction framework).
We analyzed a dataset of patients admitted through the ED to the "Sant"Orsola Malpighi University Hospital of Bologna, Italy, between January 1 and October 26, 2022. PLoS was defined as any hospitalization with LoS longer than 6 days. We deployed six classification algorithms for predicting PLoS: Random Forest (RF), Support Vector Machines (SVM), Gradient Boosting (GB), AdaBoost, K-Nearest Neighbors (KNN), and logistic regression (LoR). We evaluated the performance of these models with the Brier score, the area under the ROC curve (AUC), accuracy, sensitivity (recall), specificity, precision, and F1-score. We further developed eight regression models for LoS prediction: Linear Regression (LR), including the penalized linear models Least Absolute Shrinkage and Selection Operator (LASSO), Ridge and Elastic-net regression, Support vector regression, RF regression, KNN, and eXtreme Gradient Boosting (XGBoost) regression. The model performances were measured by their mean square error, mean absolute error, and mean relative error. The dataset was randomly split into a training set (70%) and a validation set (30%).
A total of 12,858 eligible patients were included in our study, of whom 60.88% had a PloS. The GB classifier best predicted PloS (accuracy 75%, AUC 75.4%, Brier score 0.181), followed by LoR classifier (accuracy 75%, AUC 75.2%, Brier score 0.182). These models also showed to be adequately calibrated. Ridge and XGBoost regressions best predicted LoS, with the smallest total prediction error. The overall prediction error is between 6 and 7 days, meaning there is a 6-7 day mean difference between actual and predicted LoS.
Our results demonstrate the potential of machine learning-based methods to predict LoS and provide valuable insights into the risks behind prolonged hospitalizations. In addition to physicians' clinical expertise, the results of these models can be utilized as input to make informed decisions, such as predicting hospitalizations and enhancing the overall performance of a public healthcare system.
本研究旨在开发并比较不同模型,以预测普通患者环境下通过急诊科(ED)收治的住院患者的住院时长(LoS)和延长住院时长(PLoS)。此目的并非推广任何特定模型,而是建议一种决策支持工具(即预测框架)。
我们分析了2022年1月1日至10月26日期间通过ED收治至意大利博洛尼亚“圣”奥索拉·马尔皮基大学医院的患者数据集。PLoS定义为LoS超过6天的任何住院情况。我们部署了六种用于预测PLoS的分类算法:随机森林(RF)、支持向量机(SVM)、梯度提升(GB)、AdaBoost、K近邻(KNN)和逻辑回归(LoR)。我们使用布里尔评分、ROC曲线下面积(AUC)、准确率、灵敏度(召回率)、特异性、精确率和F1分数评估这些模型的性能。我们还开发了八种用于LoS预测的回归模型:线性回归(LR),包括惩罚线性模型最小绝对收缩和选择算子(LASSO)、岭回归和弹性网络回归、支持向量回归、RF回归、KNN和极端梯度提升(XGBoost)回归。通过均方误差、平均绝对误差和平均相对误差来衡量模型性能。数据集被随机分为训练集(70%)和验证集(30%)。
我们的研究共纳入了12858名符合条件的患者,其中60.88%有PLoS。GB分类器对PLoS的预测最佳(准确率75%,AUC 75.4%,布里尔评分0.181),其次是LoR分类器(准确率75%,AUC 75.2%,布里尔评分0.182)。这些模型也显示出校准良好。岭回归和XGBoost回归对LoS的预测最佳,总预测误差最小。总体预测误差在6至7天之间,这意味着实际LoS与预测LoS之间的平均差异为6 - 7天。
我们的结果证明了基于机器学习的方法在预测LoS方面的潜力,并为延长住院时间背后的风险提供了有价值的见解。除了医生的临床专业知识外,这些模型的结果可作为输入用于做出明智决策,如预测住院情况并提高公共医疗系统的整体性能。