School of Data and Computer Science, Shandong Women's University, 2399 Daxue Road, Changqing District, Ji'nan, 250300, Shandong, China.
Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, 16992 Jingshi Road, Lixia District, Ji'nan, 250014, Shandong, China.
BMC Public Health. 2024 Oct 31;24(1):3014. doi: 10.1186/s12889-024-20532-7.
Infectious diseases are major medical and social challenges of the 21 century. Accurately predicting incidence is of great significance for public health organizations to prevent the spread of diseases. Internet search engine data, like Baidu search index, may be useful for analyzing epidemics and improving prediction.
We collected data on hepatitis E incidence and cases in Shandong province from January 2009 to December 2022 are extracted. Baidu index is available from January 2009 to December 2022. Employing Pearson correlation analysis, we validated the relationship between the Baidu index and hepatitis E incidence. We utilized various LSTM architectures, including LSTM, stacked LSTM, attention-based LSTM, and attention-based stacked LSTM, to forecast hepatitis E incidence both with and without incorporating the Baidu index. Meanwhile, we introduce KAN to LSTM models for improving nonlinear learning capability. The performance of models are evaluated by three standard quality metrics, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).
Adjusting for the Baidu index altered the correlation between hepatitis E incidence and the Baidu index from -0.1654 to 0.1733. Without Baidu index, we obtained 17.04±0.13%, 17.19±0.57%, in terms of MAPE, by LSTM and attention based stacked LSTM, respectively. With the Baidu index, we obtained 15.36±0.16%, 15.15±0.07%, in term of MAPE, by the same methods. The prediction accuracy increased by 2%. The methods with KAN can improve the performance by 0.3%. More detailed results are shown in results section of this paper.
Our experiments reveal a weak correlation and similar trends between the Baidu index and hepatitis E incidence. Baidu index proves to be valuable for predicting hepatitis E incidence. Furthermore, stack layers and KAN can also improve the representational ability of LSTM models.
传染病是 21 世纪的主要医学和社会挑战。准确预测发病率对于公共卫生组织防止疾病传播具有重要意义。互联网搜索引擎数据,如百度搜索指数,可能有助于分析疫情并改善预测。
我们从 2009 年 1 月至 2022 年 12 月期间收集了山东省戊型肝炎发病率和病例的数据。从 2009 年 1 月至 2022 年 12 月可以获得百度指数。我们采用皮尔逊相关分析验证了百度指数与戊型肝炎发病率之间的关系。我们利用各种 LSTM 架构,包括 LSTM、堆叠 LSTM、基于注意力的 LSTM 和基于注意力的堆叠 LSTM,分别在不包含和包含百度指数的情况下预测戊型肝炎发病率。同时,我们引入 KAN 来改进 LSTM 模型的非线性学习能力。通过三个标准质量指标,包括均方根误差(RMSE)、平均绝对百分比误差(MAPE)和平均绝对误差(MAE)来评估模型的性能。
调整百度指数后,戊型肝炎发病率与百度指数之间的相关性从-0.1654 变为 0.1733。不包含百度指数时,我们通过 LSTM 和基于注意力的堆叠 LSTM 分别获得了 17.04±0.13%和 17.19±0.57%的 MAPE。包含百度指数时,通过相同方法分别获得了 15.36±0.16%和 15.15±0.07%的 MAPE。预测精度提高了 2%。带有 KAN 的方法可以提高 0.3%的性能。更详细的结果在本文的结果部分显示。
我们的实验表明百度指数与戊型肝炎发病率之间存在较弱的相关性和相似的趋势。百度指数被证明对预测戊型肝炎发病率有价值。此外,堆叠层和 KAN 还可以提高 LSTM 模型的表示能力。