School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China.
Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.
BMC Med Inform Decis Mak. 2023 Apr 6;23(1):59. doi: 10.1186/s12911-023-02159-7.
With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources.
In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model.
Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD.
Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.
随着脑血管疾病(CD)的流行和医疗资源压力的增加,预测脑血管病患者的医疗需求对优化医疗资源具有重要意义。
本研究提出了一种堆叠集成模型,该模型由四个基础学习者(岭回归、随机森林、梯度提升决策树和人工神经网络)和一个元学习者(弹性网络)组成,用于根据中国成都 2015 年至 2018 年的历史住院人数(HA)数据、空气质量数据和气象数据预测 CD 的每日 HA 数。为了解决标签不平衡问题,在元学习者中集成了一种基于标签分布平滑的重加权方法。我们使用 2015 年至 2017 年的数据训练模型,并根据 2018 年的数据评估其预测能力,使用的指标包括平均绝对误差(MAE)、均方根误差(RMSE)、平均绝对百分比误差(MAPE)和决定系数(R)。此外,还应用了 SHapley Additive exPlanations(SHAP)框架来为我们的堆叠模型的预测提供解释。
我们提出的模型在两个数据集上的表现均优于所有基础学习者和长短期记忆(LSTM)。特别是,与单个模型的最佳结果相比,堆叠模型的 MAE、RMSE 和 MAPE 分别降低了 13.9%、12.7%和 5.8%,R 提高了 6.8%,在 CD 数据集上。模型解释表明,环境特征在进一步提高模型性能方面发挥了作用,并确定高温和高浓度气态空气污染物可能与 CD 风险增加密切相关。
我们考虑环境暴露的堆叠模型在预测 CD 的每日 HA 方面是有效的,在预警和医疗资源分配方面具有实际价值。