Department of Management Science and Information System, Faculty of Management and Economics, Kunming University of Science and Technology, Kunming, China.
School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
J Med Internet Res. 2023 Feb 13;25:e44238. doi: 10.2196/44238.
In megacities, there is an urgent need to establish more sensitive forecasting and early warning methods for acute respiratory infectious diseases. Existing prediction and early warning models for influenza and other acute respiratory infectious diseases have limitations and therefore there is room for improvement.
The aim of this study was to explore a new and better-performing deep-learning model to predict influenza trends from multisource heterogeneous data in a megacity.
We collected multisource heterogeneous data from the 26th week of 2012 to the 25th week of 2019, including influenza-like illness (ILI) cases and virological surveillance, data of climate and demography, and search engines data. To avoid collinearity, we selected the best predictor according to the weight and correlation of each factor. We established a new multiattention-long short-term memory (LSTM) deep-learning model (MAL model), which was used to predict the percentage of ILI (ILI%) cases and the product of ILI% and the influenza-positive rate (ILI%×positive%), respectively. We also combined the data in different forms and added several machine-learning and deep-learning models commonly used in the past to predict influenza trends for comparison. The R value, explained variance scores, mean absolute error, and mean square error were used to evaluate the quality of the models.
The highest correlation coefficients were found for the Baidu search data for ILI% and for air quality for ILI%×positive%. We first used the MAL model to calculate the ILI%, and then combined ILI% with climate, demographic, and Baidu data in different forms. The ILI%+climate+demography+Baidu model had the best prediction effect, with the explained variance score reaching 0.78, R reaching 0.76, mean absolute error of 0.08, and mean squared error of 0.01. Similarly, we used the MAL model to calculate the ILI%×positive% and combined this prediction with different data forms. The ILI%×positive%+climate+demography+Baidu model had the best prediction effect, with an explained variance score reaching 0.74, R reaching 0.70, mean absolute error of 0.02, and mean squared error of 0.02. Comparisons with random forest, extreme gradient boosting, LSTM, and gated current unit models showed that the MAL model had the best prediction effect.
The newly established MAL model outperformed existing models. Natural factors and search engine query data were more helpful in forecasting ILI patterns in megacities. With more timely and effective prediction of influenza and other respiratory infectious diseases and the epidemic intensity, early and better preparedness can be achieved to reduce the health damage to the population.
在特大城市中,迫切需要建立更敏感的急性呼吸道传染病预测和预警方法。现有的流感和其他急性呼吸道传染病预测和预警模型存在局限性,因此有改进的空间。
本研究旨在探索一种新的、性能更好的深度学习模型,以预测特大城市中多源异质数据的流感趋势。
我们从 2012 年第 26 周到 2019 年第 25 周收集了多源异质数据,包括流感样病例和病毒学监测、气候和人口数据以及搜索引擎数据。为避免共线性,我们根据每个因素的权重和相关性选择最佳预测因子。我们建立了一个新的多注意力长短期记忆(LSTM)深度学习模型(MAL 模型),分别用于预测流感样病例百分比(ILI%)和 ILI%与流感阳性率的乘积(ILI%×阳性率)。我们还结合了不同形式的数据,并添加了过去常用的几种机器学习和深度学习模型进行比较,以预测流感趋势。使用 R 值、解释方差得分、平均绝对误差和均方误差来评估模型的质量。
ILI%的最高相关系数与百度搜索数据相关,ILI%×阳性率与空气质量数据相关。我们首先使用 MAL 模型计算 ILI%,然后将 ILI%与气候、人口和百度数据以不同的形式结合起来。ILI%+气候+人口+百度模型具有最佳的预测效果,解释方差得分达到 0.78,R 达到 0.76,平均绝对误差为 0.08,均方误差为 0.01。同样,我们使用 MAL 模型计算 ILI%×阳性率,并将该预测与不同的数据形式结合起来。ILI%×阳性率+气候+人口+百度模型具有最佳的预测效果,解释方差得分达到 0.74,R 达到 0.70,平均绝对误差为 0.02,均方误差为 0.02。与随机森林、极端梯度提升、LSTM 和门控循环单元模型的比较表明,MAL 模型具有最佳的预测效果。
新建立的 MAL 模型优于现有模型。自然因素和搜索引擎查询数据在预测特大城市 ILI 模式方面更有帮助。通过更及时有效地预测流感和其他呼吸道传染病的流行强度,可以实现更好的早期准备,从而减少对人群的健康损害。