School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China.
Front Public Health. 2023 Jul 18;11:1203628. doi: 10.3389/fpubh.2023.1203628. eCollection 2023.
To analyze the time series in the correlation between search terms related to tuberculosis (TB) and actual incidence data in China. To screen out the "leading" terms and construct a timely and efficient TB prediction model that can predict the next wave of TB epidemic trend in advance.
Monthly incidence data of tuberculosis in Jiangsu Province, China, were collected from January 2011 to December 2020. A scoping approach was used to identify TB search terms around common TB terms, prevention, symptoms and treatment. Search terms for Jiangsu Province, China, from January 2011 to December 2020 were collected from the Baidu index database. Correlation coefficients between search terms and actual incidence were calculated using Python 3.6 software. The multiple linear regression model was constructed using SPSS 26.0 software, which also calculated the goodness of fit and prediction error of the model predictions.
A total of 16 keywords with correlation coefficients greater than 0.6 were screened, of which 11 were the leading terms. The R of the prediction model was 0.67 and the MAPE was 10.23%.
The TB prediction model based on Baidu Index data was able to predict the next wave of TB epidemic trends and intensity 2 months in advance. This forecasting model is currently only available for Jiangsu Province.
分析与中国结核病(TB)相关的搜索词与实际发病数据之间的时间序列相关性。筛选出“主导”术语,并构建一个及时有效的 TB 预测模型,以便提前预测下一波 TB 流行趋势。
收集 2011 年 1 月至 2020 年 12 月中国江苏省结核病的每月发病率数据。采用范围界定方法,确定与常见 TB 术语、预防、症状和治疗相关的 TB 搜索词。从 2011 年 1 月至 2020 年 12 月,从百度指数数据库中收集中国江苏省的搜索词。使用 Python 3.6 软件计算搜索词与实际发病率之间的相关系数。使用 SPSS 26.0 软件构建多元线性回归模型,该模型还计算了模型预测的拟合优度和预测误差。
共筛选出 16 个相关系数大于 0.6 的关键词,其中 11 个为主要关键词。预测模型的 R 值为 0.67,MAPE 为 10.23%。
基于百度指数数据的 TB 预测模型能够提前 2 个月预测下一波 TB 流行趋势和强度。该预测模型目前仅适用于江苏省。