Suppr超能文献

中国大陆肾综合征出血热的时间序列分析:基于 XGBoost 预测模型。

Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model.

机构信息

Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China.

Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning, China.

出版信息

BMC Infect Dis. 2021 Aug 19;21(1):839. doi: 10.1186/s12879-021-06503-y.

Abstract

BACKGROUND

Hemorrhagic fever with renal syndrome (HFRS) is still attracting public attention because of its outbreak in various cities in China. Predicting future outbreaks or epidemics disease based on past incidence data can help health departments take targeted measures to prevent diseases in advance. In this study, we propose a multistep prediction strategy based on extreme gradient boosting (XGBoost) for HFRS as an extension of the one-step prediction model. Moreover, the fitting and prediction accuracy of the XGBoost model will be compared with the autoregressive integrated moving average (ARIMA) model by different evaluation indicators.

METHODS

We collected HFRS incidence data from 2004 to 2018 of mainland China. The data from 2004 to 2017 were divided into training sets to establish the seasonal ARIMA model and XGBoost model, while the 2018 data were used to test the prediction performance. In the multistep XGBoost forecasting model, one-hot encoding was used to handle seasonal features. Furthermore, a series of evaluation indices were performed to evaluate the accuracy of the multistep forecast XGBoost model.

RESULTS

There were 200,237 HFRS cases in China from 2004 to 2018. A long-term downward trend and bimodal seasonality were identified in the original time series. According to the minimum corrected akaike information criterion (CAIC) value, the optimal ARIMA (3, 1, 0) × (1, 1, 0) model is selected. The index ME, RMSE, MAE, MPE, MAPE, and MASE indices of the XGBoost model were higher than those of the ARIMA model in the fitting part, whereas the RMSE of the XGBoost model was lower. The prediction performance evaluation indicators (MAE, MPE, MAPE, RMSE and MASE) of the one-step prediction and multistep prediction XGBoost model were all notably lower than those of the ARIMA model.

CONCLUSIONS

The multistep XGBoost prediction model showed a much better prediction accuracy and model stability than the multistep ARIMA prediction model. The XGBoost model performed better in predicting complicated and nonlinear data like HFRS. Additionally, Multistep prediction models are more practical than one-step prediction models in forecasting infectious diseases.

摘要

背景

肾综合征出血热(HFRS)仍因其在中国各城市的爆发而受到公众关注。基于过去的发病率数据预测未来的疫情爆发或疾病流行,可以帮助卫生部门提前采取有针对性的措施预防疾病。在这项研究中,我们提出了一种基于极端梯度增强(XGBoost)的 HFRS 多步预测策略,作为一步预测模型的扩展。此外,还将通过不同的评价指标比较 XGBoost 模型和自回归积分移动平均(ARIMA)模型的拟合和预测精度。

方法

我们收集了中国大陆 2004 年至 2018 年 HFRS 的发病率数据。2004 年至 2017 年的数据被分为训练集,用于建立季节性 ARIMA 模型和 XGBoost 模型,而 2018 年的数据则用于测试预测性能。在多步 XGBoost 预测模型中,使用独热编码处理季节性特征。此外,还进行了一系列评估指标,以评估多步预测 XGBoost 模型的准确性。

结果

2004 年至 2018 年期间,中国大陆共有 200237 例 HFRS 病例。原始时间序列呈现长期下降趋势和双峰季节性。根据最小校正赤池信息量准则(CAIC)值,选择最优的 ARIMA(3,1,0)×(1,1,0)模型。在拟合部分,XGBoost 模型的 ME、RMSE、MAE、MPE、MAPE 和 MASE 指数均高于 ARIMA 模型,而 RMSE 则低于 ARIMA 模型。XGBoost 模型的一步预测和多步预测的预测性能评价指标(MAE、MPE、MAPE、RMSE 和 MASE)均明显低于 ARIMA 模型。

结论

多步 XGBoost 预测模型的预测精度和模型稳定性均优于多步 ARIMA 预测模型。XGBoost 模型在预测 HFRS 等复杂非线性数据方面表现更好。此外,多步预测模型在预测传染病方面比一步预测模型更实用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13bc/8377883/0b07f2369665/12879_2021_6503_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验