• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种集成方法改进了韩国新冠疫情的预测。

An ensemble approach improves the prediction of the COVID-19 pandemic in South Korea.

作者信息

Han Kyulhee, Apio Catherine, Song Hanbyul, Lee Bogyeom, Hu Xuwen, Park Jiwon, Zhe Liu, Goo Taewan, Park Taesung

机构信息

Interdisciplinary Program of Bioinformatics, Seoul National University, Seoul, Korea.

Department of Industrial Engineering, Seoul National University, Seoul, Korea.

出版信息

J Glob Health. 2025 Mar 28;15:04079. doi: 10.7189/jogh.15.04079.

DOI:10.7189/jogh.15.04079
PMID:40146993
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11949510/
Abstract

BACKGROUND

Modelling can contribute to disease prevention and control strategies. Accurate predictions of future cases and mortality rates were essential for establishing appropriate policies during the COVID-19 pandemic. However, no single model yielded definite conclusions, with each having specific strengths and weaknesses. Here we propose an ensemble learning approach which can offset the limitations of each model and improve prediction performances.

METHODS

We generated predictions for the transmission and impact of COVID-19 in South Korea using seven individual models, including mathematical, statistical, and machine learning approaches. We integrated these predictions using three ensemble methods: stacking, average, and weighted average ensemble (WAE). We used train and test errors to measure a model's performance and selected the best covariate combinations based on the lowest train error. We then evaluated model performance using five error measures (r, weighted mean absolute percentage error (WMAPE), autoregressive integrated moving average (ARIMA), mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE)) and selected the optimal covariate combination accordingly. To validate the generalisability of our approach, we applied the same modelling framework to USA data.

RESULTS

Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data evaluated using the WMAPE, individual models achieved the following: Generalised additive modelling (GAM) reached a value of 0.244 for the daily number of confirmed cases, a value of 0.172 for the time series Poisson for the daily number of confirmed deaths, and a value of 0.022 for both ARIMA and time series Poisson for the daily number of ICU patients. For smoothed data, the Holt-Winters model achieved a value of 0.058 for daily confirmed cases, while ARIMA attained a value of 0.058 for the daily number of confirmed deaths and 0.013 for the daily number of ICU patients. Among ensemble models, the SVM-based stacking ensemble achieved error values of 0.235 for the daily number of confirmed cases, 0.118 for the daily number of deaths, and 0.019 for the daily number of ICU patients on raw data. For smoothed data, the average ensemble and weighted average ensemble achieved 0.060 for the daily number of confirmed cases and 0.013 for daily ICU patients. The ensemble models also generalised well when applied to data from the USA.Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data, GAM (0.244) predicted daily confirmed cases best, time series Poisson (0.172) predicted daily confirmed deaths, and both ARIMA and time series Poisson (0.022) predicted daily ICU patients, based on WMAPE. For smoothed data, time series Poisson predicted daily confirmed cases (0.065) best, while ARIMA best predicted daily confirmed deaths (0.058) and ICU patients (0.013). For ensemble models, stacking ensemble using SVM was the best model for predicting daily confirmed cases (0.228), deaths (0.11), and ICU patients (0.02). With smoothed data, average ensemble and WAE were the best models for predicting daily confirmed cases (0.058) and ICU patients (0.011). The performance of ensemble models was generalised to other countries using the USA data for predictive performance.

CONCLUSIONS

No single model performed consistently. While the ensemble models did not always provide the best predictions, a comparison of first-best and second-best models showed that they performed considerably better than the single models. If an ensemble model was not the best performing model, its performance was always not far from the best single model: a look at the mean and variance of the error measures shows that ensemble models provided stable predictions without much variation in their performances compared to single models. These results can be used to inform policymaking during future pandemics.

摘要

背景

建模有助于疾病预防和控制策略。在新冠疫情期间,准确预测未来病例数和死亡率对于制定适当政策至关重要。然而,没有一个单一模型能得出明确结论,每个模型都有其特定的优缺点。在此,我们提出一种集成学习方法,该方法可以弥补每个模型的局限性并提高预测性能。

方法

我们使用七种个体模型对韩国新冠疫情的传播和影响进行预测,这些模型包括数学、统计和机器学习方法。我们使用三种集成方法对这些预测进行整合:堆叠、平均和加权平均集成(WAE)。我们使用训练误差和测试误差来衡量模型性能,并根据最低训练误差选择最佳协变量组合。然后,我们使用五种误差度量(r、加权平均绝对百分比误差(WMAPE)、自回归积分移动平均(ARIMA)、均方误差(MSE)、均方根误差(RMSE)和平均绝对百分比误差(MAPE))评估模型性能,并据此选择最优协变量组合。为验证我们方法的通用性,我们将相同的建模框架应用于美国数据。

结果

加强针接种率+奥密克戎变种BA.5感染率是最常被选择的协变量组合。对于使用WMAPE评估的原始数据,个体模型的表现如下:广义相加模型(GAM)对每日确诊病例数的预测值为0.244,时间序列泊松模型对每日确诊死亡数的预测值为0.172,ARIMA模型和时间序列泊松模型对每日ICU患者数的预测值均为0.022。对于平滑后的数据,霍尔特-温特斯模型对每日确诊病例数的预测值为0.058,而ARIMA模型对每日确诊死亡数的预测值为0.058,对每日ICU患者数的预测值为0.013。在集成模型中,基于支持向量机的堆叠集成模型对原始数据的每日确诊病例数的误差值为0.235,对每日死亡数的误差值为0.118,对每日ICU患者数的误差值为0.019。对于平滑后的数据,平均集成模型和加权平均集成模型对每日确诊病例数的预测值为0.060,对每日ICU患者数的预测值为0.013。当应用于美国数据时,集成模型也具有良好的通用性。加强针接种率+奥密克戎变种BA.5感染率是最常被选择的协变量组合。对于原始数据,基于WMAPE,GAM(0.244)对每日确诊病例的预测最佳,时间序列泊松模型(0.172)对每日确诊死亡的预测最佳,ARIMA模型和时间序列泊松模型(0.022)对每日ICU患者的预测最佳。对于平滑后的数据,时间序列泊松模型对每日确诊病例(0.065)的预测最佳,而ARIMA模型对每日确诊死亡(0.058)和ICU患者(0.013)的预测最佳。对于集成模型,使用支持向量机的堆叠集成模型是预测每日确诊病例(0.228)、死亡(0.11)和ICU患者(0.02)的最佳模型。对于平滑后的数据,平均集成模型和WAE是预测每日确诊病例(0.058)和ICU患者(0.011)的最佳模型。使用美国数据进行预测性能评估时,集成模型的性能在其他国家也具有通用性。

结论

没有一个单一模型始终表现最佳。虽然集成模型并不总是能提供最佳预测,但对最佳模型和次佳模型的比较表明,它们的表现明显优于单一模型。如果一个集成模型不是表现最佳的模型,其性能也总是与最佳单一模型相差不远:查看误差度量的均值和方差表明,与单一模型相比,集成模型提供了稳定的预测,其性能变化不大。这些结果可用于为未来疫情期间的决策提供参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/13f3c7720430/jogh-15-04079-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/fb6736f6b763/jogh-15-04079-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/c3914add68ab/jogh-15-04079-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/13f3c7720430/jogh-15-04079-F3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/fb6736f6b763/jogh-15-04079-F1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/c3914add68ab/jogh-15-04079-F2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e15/11949510/13f3c7720430/jogh-15-04079-F3.jpg

相似文献

1
An ensemble approach improves the prediction of the COVID-19 pandemic in South Korea.一种集成方法改进了韩国新冠疫情的预测。
J Glob Health. 2025 Mar 28;15:04079. doi: 10.7189/jogh.15.04079.
2
A COVID-19 Pandemic Artificial Intelligence-Based System With Deep Learning Forecasting and Automatic Statistical Data Acquisition: Development and Implementation Study.一种基于人工智能的新冠肺炎大流行深度学习预测与自动统计数据采集系统:开发与实施研究
J Med Internet Res. 2021 May 20;23(5):e27806. doi: 10.2196/27806.
3
Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA).使用统计机器学习模型(自回归积分移动平均模型(ARIMA)和季节性自回归积分移动平均模型(SARIMA))预测16个主要国家的新冠累计病例(确诊、康复和死亡)动态。
Appl Soft Comput. 2021 May;103:107161. doi: 10.1016/j.asoc.2021.107161. Epub 2021 Feb 8.
4
Time series prediction of under-five mortality rates for Nigeria: comparative analysis of artificial neural networks, Holt-Winters exponential smoothing and autoregressive integrated moving average models.尼日利亚五岁以下儿童死亡率的时间序列预测:人工神经网络、Holt-Winters 指数平滑和自回归综合移动平均模型的比较分析。
BMC Med Res Methodol. 2020 Dec 3;20(1):292. doi: 10.1186/s12874-020-01159-9.
5
Prediction of global omicron pandemic using ARIMA, MLR, and Prophet models.使用 ARIMA、MLR 和 Prophet 模型预测全球奥密克戎疫情。
Sci Rep. 2022 Oct 28;12(1):18138. doi: 10.1038/s41598-022-23154-4.
6
Prediction of the COVID-19 Pandemic for the Top 15 Affected Countries: Advanced Autoregressive Integrated Moving Average (ARIMA) Model.预测受 COVID-19 影响最严重的 15 个国家:高级自回归综合移动平均 (ARIMA) 模型。
JMIR Public Health Surveill. 2020 May 13;6(2):e19115. doi: 10.2196/19115.
7
Implementation of stacking based ARIMA model for prediction of Covid-19 cases in India.基于堆叠的 ARIMA 模型在印度新冠病例预测中的应用。
J Biomed Inform. 2021 Sep;121:103887. doi: 10.1016/j.jbi.2021.103887. Epub 2021 Aug 15.
8
Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh.ARIMA和XGBoost预测模型在预测孟加拉国新冠肺炎发病率方面的准确性比较
PLOS Glob Public Health. 2022 May 18;2(5):e0000495. doi: 10.1371/journal.pgph.0000495. eCollection 2022.
9
Improving the precision of modeling the incidence of hemorrhagic fever with renal syndrome in mainland China with an ensemble machine learning approach.采用集成机器学习方法提高中国大陆肾综合征出血热发病率建模的精度。
PLoS One. 2021 Mar 16;16(3):e0248597. doi: 10.1371/journal.pone.0248597. eCollection 2021.
10
Real-time forecasting of COVID-19 spread according to protective behavior and vaccination: autoregressive integrated moving average models.根据防护行为和疫苗接种情况实时预测 COVID-19 传播:自回归积分移动平均模型。
BMC Public Health. 2023 Aug 8;23(1):1500. doi: 10.1186/s12889-023-16419-8.

本文引用的文献

1
Forecasting the spread of COVID-19 based on policy, vaccination, and Omicron data.基于政策、疫苗接种和奥密克戎数据预测 COVID-19 的传播。
Sci Rep. 2024 Apr 30;14(1):9962. doi: 10.1038/s41598-024-58835-9.
2
Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain's case study.通过经典机器学习模型和集成学习模型预测 COVID-19 的传播:西班牙案例研究。
Sci Rep. 2023 Apr 25;13(1):6750. doi: 10.1038/s41598-023-33795-8.
3
A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA.
基于树模型的数据驱动可解释集成框架,用于预测美国 COVID-19 的发生情况。
Environ Sci Pollut Res Int. 2023 Jan;30(5):13648-13659. doi: 10.1007/s11356-022-23132-3. Epub 2022 Sep 22.
4
An evaluation of mathematical models for the outbreak of COVID-19.新型冠状病毒肺炎疫情数学模型评估
Precis Clin Med. 2020 Jun;3(2):85-93. doi: 10.1093/pcmedi/pbaa016. Epub 2020 May 22.
5
Deep learning models for forecasting dengue fever based on climate data in Vietnam.基于越南气候数据的登革热预测深度学习模型。
PLoS Negl Trop Dis. 2022 Jun 13;16(6):e0010509. doi: 10.1371/journal.pntd.0010509. eCollection 2022 Jun.
6
GISAID's Role in Pandemic Response.全球流感共享数据库(GISAID)在大流行应对中的作用。
China CDC Wkly. 2021 Dec 3;3(49):1049-1051. doi: 10.46234/ccdcw2021.255.
7
Author Correction: A global database of COVID-19 vaccinations.作者更正:一个全球新冠疫苗接种数据库。
Nat Hum Behav. 2021 Jul;5(7):956-959. doi: 10.1038/s41562-021-01160-2.
8
Cemiplimab is a new option in BCC.西米普利单抗是基底细胞癌的一种新选择。
Nat Rev Clin Oncol. 2021 Jul;18(7):400. doi: 10.1038/s41571-021-00528-7.
9
A global database of COVID-19 vaccinations.一个全球 COVID-19 疫苗接种数据库。
Nat Hum Behav. 2021 Jul;5(7):947-953. doi: 10.1038/s41562-021-01122-8. Epub 2021 May 10.
10
Unknown uncertainties in the COVID-19 pandemic: Multi-dimensional identification and mathematical modelling for the analysis and estimation of the casualties.新冠疫情中的未知不确定性:伤亡分析与估计的多维度识别及数学建模
Digit Signal Process. 2021 Jul;114:103058. doi: 10.1016/j.dsp.2021.103058. Epub 2021 Apr 15.