文献检索，用中文搜 PubMed

BACKGROUND

Infectious diseases are major medical and social challenges of the 21 century. Accurately predicting incidence is of great significance for public health organizations to prevent the spread of diseases. Internet search engine data, like Baidu search index, may be useful for analyzing epidemics and improving prediction.

METHODS

We collected data on hepatitis E incidence and cases in Shandong province from January 2009 to December 2022 are extracted. Baidu index is available from January 2009 to December 2022. Employing Pearson correlation analysis, we validated the relationship between the Baidu index and hepatitis E incidence. We utilized various LSTM architectures, including LSTM, stacked LSTM, attention-based LSTM, and attention-based stacked LSTM, to forecast hepatitis E incidence both with and without incorporating the Baidu index. Meanwhile, we introduce KAN to LSTM models for improving nonlinear learning capability. The performance of models are evaluated by three standard quality metrics, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).

RESULTS

Adjusting for the Baidu index altered the correlation between hepatitis E incidence and the Baidu index from -0.1654 to 0.1733. Without Baidu index, we obtained 17.04±0.13%, 17.19±0.57%, in terms of MAPE, by LSTM and attention based stacked LSTM, respectively. With the Baidu index, we obtained 15.36±0.16%, 15.15±0.07%, in term of MAPE, by the same methods. The prediction accuracy increased by 2%. The methods with KAN can improve the performance by 0.3%. More detailed results are shown in results section of this paper.

CONCLUSIONS

Our experiments reveal a weak correlation and similar trends between the Baidu index and hepatitis E incidence. Baidu index proves to be valuable for predicting hepatitis E incidence. Furthermore, stack layers and KAN can also improve the representational ability of LSTM models.

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

传染病是 21 世纪的主要医学和社会挑战。准确预测发病率对于公共卫生组织防止疾病传播具有重要意义。互联网搜索引擎数据，如百度搜索指数，可能有助于分析疫情并改善预测。

方法

我们从 2009 年 1 月至 2022 年 12 月期间收集了山东省戊型肝炎发病率和病例的数据。从 2009 年 1 月至 2022 年 12 月可以获得百度指数。我们采用皮尔逊相关分析验证了百度指数与戊型肝炎发病率之间的关系。我们利用各种 LSTM 架构，包括 LSTM、堆叠 LSTM、基于注意力的 LSTM 和基于注意力的堆叠 LSTM，分别在不包含和包含百度指数的情况下预测戊型肝炎发病率。同时，我们引入 KAN 来改进 LSTM 模型的非线性学习能力。通过三个标准质量指标，包括均方根误差（RMSE）、平均绝对百分比误差（MAPE）和平均绝对误差（MAE）来评估模型的性能。

结果

调整百度指数后，戊型肝炎发病率与百度指数之间的相关性从-0.1654 变为 0.1733。不包含百度指数时，我们通过 LSTM 和基于注意力的堆叠 LSTM 分别获得了 17.04±0.13%和 17.19±0.57%的 MAPE。包含百度指数时，通过相同方法分别获得了 15.36±0.16%和 15.15±0.07%的 MAPE。预测精度提高了 2%。带有 KAN 的方法可以提高 0.3%的性能。更详细的结果在本文的结果部分显示。