Suppr超能文献

使用堆叠集成学习对脑血管病的每日住院人数进行可解释预测。

Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning.

机构信息

School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China.

Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

BMC Med Inform Decis Mak. 2023 Apr 6;23(1):59. doi: 10.1186/s12911-023-02159-7.

Abstract

BACKGROUND

With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources.

METHODS

In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model.

RESULTS

Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD.

CONCLUSIONS

Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.

摘要

背景

随着脑血管疾病(CD)的流行和医疗资源压力的增加,预测脑血管病患者的医疗需求对优化医疗资源具有重要意义。

方法

本研究提出了一种堆叠集成模型,该模型由四个基础学习者(岭回归、随机森林、梯度提升决策树和人工神经网络)和一个元学习者(弹性网络)组成,用于根据中国成都 2015 年至 2018 年的历史住院人数(HA)数据、空气质量数据和气象数据预测 CD 的每日 HA 数。为了解决标签不平衡问题,在元学习者中集成了一种基于标签分布平滑的重加权方法。我们使用 2015 年至 2017 年的数据训练模型,并根据 2018 年的数据评估其预测能力,使用的指标包括平均绝对误差(MAE)、均方根误差(RMSE)、平均绝对百分比误差(MAPE)和决定系数(R)。此外,还应用了 SHapley Additive exPlanations(SHAP)框架来为我们的堆叠模型的预测提供解释。

结果

我们提出的模型在两个数据集上的表现均优于所有基础学习者和长短期记忆(LSTM)。特别是,与单个模型的最佳结果相比,堆叠模型的 MAE、RMSE 和 MAPE 分别降低了 13.9%、12.7%和 5.8%,R 提高了 6.8%,在 CD 数据集上。模型解释表明,环境特征在进一步提高模型性能方面发挥了作用,并确定高温和高浓度气态空气污染物可能与 CD 风险增加密切相关。

结论

我们考虑环境暴露的堆叠模型在预测 CD 的每日 HA 方面是有效的,在预警和医疗资源分配方面具有实际价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ad1/10080841/acfb7db39a36/12911_2023_2159_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验