Huang Kian A, Hardin William M, Prakash Neelesh S
Radiology, University of South Florida Morsani College of Medicine, Tampa, USA.
Cureus. 2025 Jun 18;17(6):e86276. doi: 10.7759/cureus.86276. eCollection 2025 Jun.
Accurate forecasting of emergency department (ED) patient volumes is critical for optimizing hospital resource allocation and staffing. This preliminary study evaluates the performance of an eXtreme Gradient Boosting (XGBoost)-based regression model in predicting daily ED visit counts across three simulated hospitals, using time-series features derived from synthetic hospital data retrieved from a publicly available Kaggle dataset (n=300).
For each hospital, we trained an XGBoost model using engineered temporal features, recent lagged values, and rolling averages of past patient volumes. Feature engineering included day of the week, month, week of the year, quarter of the year, and weekend status. Model performance was benchmarked against three general baselines: a naive lag-1 predictor, a constant mean predictor, and a three-day rolling mean. Performance was assessed using mean squared error (MSE), root MSE (RMSE), mean absolute error (MAE), and R² score.
The XGBoost model consistently outperformed all baseline methods across all hospitals. For Hospital 101, it achieved an R² of 0.55 compared to 0.27 for the rolling mean and negative R² values for naive and mean baselines. Hospital 102 showed improved accuracy with an R² of 0.69 versus 0.12 for the rolling mean. The best performance was observed at Hospital 103, where XGBoost achieved an R² of 0.81, significantly outperforming all baselines. Across all sites, XGBoost reduced RMSE and MAE by more than 40% relative to the best-performing baseline.
Leveraging temporal and historical patterns in simulated ED data, the XGBoost model delivers markedly more accurate volume forecasts than traditional baseline methods. These findings on synthetic data support the potential for machine learning-based forecasting models in enhancing hospital operational decision-making, with future directions involving the use of real-world hospital data.
准确预测急诊科患者数量对于优化医院资源分配和人员配置至关重要。这项初步研究评估了基于极端梯度提升(XGBoost)的回归模型在预测三家模拟医院每日急诊科就诊人数方面的性能,使用从公开可用的Kaggle数据集中检索的合成医院数据(n = 300)得出的时间序列特征。
对于每家医院,我们使用经过工程处理的时间特征、近期滞后值和过去患者数量的滚动平均值训练了一个XGBoost模型。特征工程包括星期几、月份、一年中的周数、一年中的季度以及周末状态。模型性能以三个通用基线为基准:一个简单的滞后1预测器、一个恒定均值预测器和一个三天滚动均值。使用均方误差(MSE)、均方根误差(RMSE)、平均绝对误差(MAE)和R²分数评估性能。
XGBoost模型在所有医院中始终优于所有基线方法。对于101医院,其R²为0.55,而滚动均值为0.27,简单和均值基线的R²值为负。102医院的准确率有所提高,R²为0.69,而滚动均值为0.12。在103医院观察到最佳性能,XGBoost的R²为0.81,显著优于所有基线。在所有站点,XGBoost相对于表现最佳的基线将RMSE和MAE降低了40%以上。
利用模拟急诊科数据中的时间和历史模式,XGBoost模型提供的就诊人数预测比传统基线方法明显更准确。这些关于合成数据的发现支持了基于机器学习的预测模型在加强医院运营决策方面的潜力,未来的方向包括使用真实世界的医院数据。