Wu Hongyan, Cai Yunpeng, Wu Yongsheng, Zhong Ren, Li Qi, Zheng Jing, Lin Denan, Li Ye
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.
Shenzhen Center for Disease Control and Prevention.
Biosci Trends. 2017 Jul 24;11(3):292-296. doi: 10.5582/bst.2017.01035. Epub 2017 May 8.
Influenza, a disease caused by a respiratory virus, sickened over 5,043,127 citizens in Shenzhen, China, from January 2014 to April 2016. An accurate forecasting of outbreaks of influenza-like illness (ILI, here we refer to ILI as the upper respiratory infection) could facilitate public health officials to suggest public health actions earlier. In this study, a random forest regression constructed with a one-year period of factors was adopted to forecast the weekly ILI rate using the clinical data from Shenzhen Health Information Center. The following conclusions were drawn based on this method: i) Compared to the predication with 52 (one-year) history observations, the accuracy of the predication was improved by adding another 52 first-order difference variables: mean absolute percentage error (MAPE) decreased from 5.04% to 4.35% and mean squared error (MSE) decreased from 2.85E-04 to 1.97E-04. ii) The variables with the first-order difference seemed more significant than the original history observations during the predication. In addition, both the recent observations and the later observations seemed important in the predicating procedure. iii) Analysis using the Pearson correlation concluded that weather conditions, the influence of which could have been implied by history observations and seemed insignificant for the predication, showed correlation to the weekly average temperature and maximum temperature. The correlation coefficients were -0.3656 and -0.3583, respectively.
流感是一种由呼吸道病毒引起的疾病,在2014年1月至2016年4月期间,中国深圳有超过5043127名市民患病。准确预测流感样疾病(ILI,在此我们将ILI视为上呼吸道感染)的爆发,有助于公共卫生官员更早地提出公共卫生行动建议。在本研究中,采用基于一年期因素构建的随机森林回归模型,利用深圳卫生信息中心的临床数据预测每周的ILI发病率。基于该方法得出以下结论:i)与使用52个(一年)历史观测值进行预测相比,通过添加另外52个一阶差分变量,预测准确性得到提高:平均绝对百分比误差(MAPE)从5.04%降至4.35%,均方误差(MSE)从2.85E - 04降至1.97E - 04。ii)在预测过程中,一阶差分变量似乎比原始历史观测值更具显著性。此外,近期观测值和后期观测值在预测过程中似乎都很重要。iii)使用Pearson相关性分析得出,天气状况与每周平均气温和最高气温存在相关性,其相关性此前可能已隐含在历史观测值中,但对预测而言似乎不显著,相关系数分别为 - 0.3656和 - 0.3583。