Complexity Science Institute, Qingdao University, Qingdao, Shandong, China.
State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China.
PLoS One. 2021 Dec 22;16(12):e0261629. doi: 10.1371/journal.pone.0261629. eCollection 2021.
Hand, foot and mouth disease (HFMD) is an increasingly serious public health problem, and it has caused an outbreak in China every year since 2008. Predicting the incidence of HFMD and analyzing its influential factors are of great significance to its prevention. Now, machine learning has shown advantages in infectious disease models, but there are few studies on HFMD incidence based on machine learning that cover all the provinces in mainland China. In this study, we proposed two different machine learning algorithms, Random Forest and eXtreme Gradient Boosting (XGBoost), to perform our analysis and prediction. We first used Random Forest to examine the association between HFMD incidence and potential influential factors for 31 provinces in mainland China. Next, we established Random Forest and XGBoost prediction models using meteorological and social factors as the predictors. Finally, we applied our prediction models in four different regions of mainland China and evaluated the performance of them. Our results show that: 1) Meteorological factors and social factors jointly affect the incidence of HFMD in mainland China. Average temperature and population density are the two most significant influential factors; 2) Population flux has different delayed effect in affecting HFMD incidence in different regions. From a national perspective, the model using population flux data delayed for one month has better prediction performance; 3) The prediction capability of XGBoost model was better than that of Random Forest model from the overall perspective. XGBoost model is more suitable for predicting the incidence of HFMD in mainland China.
手足口病(HFMD)是一个日益严重的公共卫生问题,自 2008 年以来,中国每年都会爆发手足口病疫情。预测手足口病的发病率并分析其影响因素,对其预防具有重要意义。现在,机器学习在传染病模型中表现出了优势,但基于机器学习的手足口病发病率研究,涵盖中国大陆所有省份的却很少。在这项研究中,我们提出了两种不同的机器学习算法,随机森林和极端梯度提升(XGBoost),来进行分析和预测。我们首先使用随机森林来检验手足口病发病率与中国大陆 31 个省份的潜在影响因素之间的关联。接下来,我们使用气象和社会因素作为预测因子,建立了随机森林和 XGBoost 预测模型。最后,我们将预测模型应用于中国大陆的四个不同地区,并评估了它们的性能。我们的结果表明:1)气象因素和社会因素共同影响中国大陆手足口病的发病率。平均温度和人口密度是两个最重要的影响因素;2)人口通量在不同地区对手足口病发病率的影响具有不同的时滞效应。从全国范围来看,使用滞后一个月的人口通量数据的模型具有更好的预测性能;3)从整体来看,XGBoost 模型的预测能力优于随机森林模型。XGBoost 模型更适合预测中国大陆手足口病的发病率。