Li Jizhen, Li Yuhong, Ye Ming, Yao Sanqiao, Yu Chongchong, Wang Lei, Wu Weidong, Wang Yongbin
Department of Epidemiology and Health Statistics, School of Public Health, Xinxiang Medical University, Xinxiang, Henan Province, People's Republic of China.
National Center for Tuberculosis Control and Prevention, China Center for Disease Control and Prevention, Beijing, People's Republic of China.
Infect Drug Resist. 2021 May 25;14:1941-1955. doi: 10.2147/IDR.S299704. eCollection 2021.
The purpose of this study is to develop a novel data-driven hybrid model by fusing ensemble empirical mode decomposition (EEMD), seasonal autoregressive integrated moving average (SARIMA), with nonlinear autoregressive artificial neural network (NARNN), called EEMD-ARIMA-NARNN model, to assess and forecast the epidemic patterns of TB in Tibet.
The TB incidence from January 2006 to December 2017 was obtained, and then the time series was partitioned into training subsamples (from January 2006 to December 2016) and testing subsamples (from January to December 2017). Among them, the training set was used to develop the EEMD-SARIMA-NARNN combined model, whereas the testing set was used to validate the forecasting performance of the model. Whilst the forecasting accuracy level of this novel method was compared with the basic SARIMA model, basic NARNN model, error-trend-seasonal (ETS) model, and traditional SARIMA-NARNN mixture model.
By comparing the accuracy level of the forecasting measurements including root-mean-square error, mean absolute deviation, mean error rate, mean absolute percentage error, and root-mean-square percentage error, it was shown that the EEMD-SARIMA-NARNN combined method produced lower error rates than the others. The descriptive statistics suggested that TB was a seasonal disease, peaking in late winter and early spring and a trough in autumn and early winter, and the TB epidemic indicated a drastic increase by a factor of 1.7 from 2006 to 2017 in Tibet, with average annual percentage change of 5.8 (95% confidence intervals: 3.5-8.1).
This novel data-driven hybrid method can better consider both linear and nonlinear components in the TB incidence than the others used in this study, which is of great help to estimate and forecast the future epidemic trends of TB in Tibet. Besides, under present trends, strict precautionary measures are required to reduce the spread of TB in Tibet.
本研究旨在通过融合集合经验模态分解(EEMD)、季节性自回归积分滑动平均模型(SARIMA)和非线性自回归人工神经网络(NARNN),开发一种新型的数据驱动混合模型,即EEMD - ARIMA - NARNN模型,以评估和预测西藏结核病的流行模式。
获取2006年1月至2017年12月的结核病发病率,然后将时间序列分为训练子样本(2006年1月至2016年12月)和测试子样本(2017年1月至12月)。其中,训练集用于开发EEMD - SARIMA - NARNN组合模型,而测试集用于验证该模型的预测性能。同时,将这种新方法的预测准确率水平与基本SARIMA模型、基本NARNN模型、误差趋势季节性(ETS)模型以及传统的SARIMA - NARNN混合模型进行比较。
通过比较预测指标的准确率水平,包括均方根误差、平均绝对偏差、平均误差率、平均绝对百分比误差和均方根百分比误差,结果表明EEMD - SARIMA - NARNN组合方法产生的错误率低于其他方法。描述性统计表明,结核病是一种季节性疾病,在冬末和早春达到高峰,在秋季和初冬出现低谷,并且西藏的结核病流行从2006年到2017年急剧增加了1.7倍,年均变化率为5.8(95%置信区间:3.5 - 8.1)。
这种新型的数据驱动混合方法比本研究中使用的其他方法能更好地考虑结核病发病率中的线性和非线性成分,这对估计和预测西藏未来结核病的流行趋势有很大帮助。此外,在当前趋势下,需要采取严格的预防措施以减少结核病在西藏的传播。