Department of Epidemiology and Biostatistics, Guangxi Medical University, 22 Shuangyong Road, Qingxiu District, Nanning, Guangxi, 530021, China.
Institute of Life Science, Guangxi Medical University, Nanning, China.
BMC Infect Dis. 2024 Sep 19;24(1):1006. doi: 10.1186/s12879-024-09940-7.
It is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu's search volume, in the early warning and predicting the epidemic trend of COVID-19.
The daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable.
The Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of "Influenza" and "Pneumonia" in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of "SARS", "Pneumonia", "Coronavirus" in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while "Influenza" changed to negative correlated (r: -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of "COVID-19", "Pneumonia", "Coronavirus", "SARS" and "Mask" could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February.
The Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.
基于现有的监测系统,很难发现突发传染病的爆发。在这里,我们研究了百度搜索指数的效用,百度搜索指数是关键词在百度搜索量中的一个指标,用于预警和预测 COVID-19 的疫情趋势。
收集并分析了 2019 年 12 月 1 日至 2020 年 3 月 15 日期间来自 8 个关键词(按人口加权)的每日病例数和百度搜索指数,并进行了时间序列和 Spearman 相关性分析,具有不同的时间滞后。使用零膨胀负二项回归模型在第一阶段和第二阶段和第三阶段使用负二项回归模型,基于自变量的特征。
在确定病原体之前,武汉所有关键词的百度搜索指数均明显高于湖北(不包括武汉)和中国(不包括湖北)。武汉新发病例数增加时,“流感”和“肺炎”的搜索量增加,其相关系数分别为 0.69 和 0.59。在病原体公开但 COVID-19 尚未被列为法定报告疾病之前,所有研究地区的“非典”、“肺炎”、“冠状病毒”的搜索量均与新发病例数呈正相关,相关系数为 0.690.89,而“流感”则呈负相关(r:-0.56-0.64)。在 COVID-19 受到密切监测之后,武汉、湖北(不包括武汉)和中国(不包括湖北)的“COVID-19”、“肺炎”、“冠状病毒”、“非典”和“口罩”的百度搜索指数分别可以在 15 天、5 天和 6 天的时间范围内预测疫情趋势。武汉和湖北(不包括武汉)的预测病例数将分别比实际病例数增加 1.84 倍和 4.81 倍,从 2020 年 1 月 21 日至 2 月 9 日。
百度搜索指数可用于 COVID-19 的预警和预测疫情趋势,但搜索关键词在不同时期有所变化。考虑到从发病到诊断的时间滞后,特别是在医疗资源短缺的地区,互联网搜索数据可以是现有监测系统的有效补充。