• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用搜索引擎查询发现韩国新冠肺炎病例预测中的时变公众兴趣:信息流行病学研究

Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study.

作者信息

Ahn Seong-Ho, Yim Kwangil, Won Hyun-Sik, Kim Kang-Min, Jeong Dong-Hwa

机构信息

Department of Artificial Intelligence, The Catholic University of Korea, Bucheon-Si, Republic of Korea.

Department of Hospital Pathology, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.

出版信息

J Med Internet Res. 2024 Dec 16;26:e63476. doi: 10.2196/63476.

DOI:10.2196/63476
PMID:39680913
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11686031/
Abstract

BACKGROUND

The number of confirmed COVID-19 cases is a crucial indicator of policies and lifestyles. Previous studies have attempted to forecast cases using machine learning techniques that use a previous number of case counts and search engine queries predetermined by experts. However, they have limitations in reflecting temporal variations in queries associated with pandemic dynamics.

OBJECTIVE

This study aims to propose a novel framework to extract keywords highly associated with COVID-19, considering their temporal occurrence. We aim to extract relevant keywords based on pandemic variations using query expansion. Additionally, we examine time-delayed web-based search behavior related to public interest in COVID-19 and adjust for better prediction performance.

METHODS

To capture temporal semantics regarding COVID-19, word embedding models were trained on a news corpus, and the top 100 words related to "Corona" were extracted over 4-month windows. Time-lagged cross-correlation was applied to select optimal time lags correlated to confirmed cases from the expanded queries. Subsequently, ElasticNet regression models were trained after reducing the feature dimensions using principal component analysis of the time-lagged features to predict future daily case counts.

RESULTS

Our approach successfully extracted relevant keywords depending on the pandemic phase, encompassing keywords directly related to COVID-19, such as its symptoms, and its societal impact. Specifically, during the first outbreak, keywords directly linked to COVID-19 and past infectious disease outbreaks similar to those of COVID-19 exhibited a high positive correlation. In the second phase of the pandemic, as community infections emerged, keywords related to the government's pandemic control policies were frequently observed with a high positive correlation. In the third phase of the pandemic, during the delta variant outbreak, keywords such as "economic crisis" and "anxiety" appeared, reflecting public fatigue. Consequently, prediction models trained by the extracted queries over 4-month windows outperformed previous methods for most predictions 1-14 days ahead. Notably, our approach showed significantly higher Pearson correlation coefficients than models based solely on the number of past cases for predictions 9-11 days ahead (P=.02, P<.01, and P<.01), in contrast to heuristic- and symptom-based query sets.

CONCLUSIONS

This study proposes a novel COVID-19 case-prediction model that automatically extracts relevant queries over time using word embedding. The model outperformed previous methods that relied on static symptom-based or heuristic queries, even without prior expert knowledge. The results demonstrate the capability of our approach to track temporal shifts in public interest regarding changes in the pandemic.

摘要

背景

新冠病毒病确诊病例数是政策和生活方式的关键指标。以往的研究曾尝试使用机器学习技术来预测病例数,这些技术利用先前的病例数计数以及专家预先确定的搜索引擎查询。然而,它们在反映与疫情动态相关的查询中的时间变化方面存在局限性。

目的

本研究旨在提出一个新颖的框架,考虑与新冠病毒病高度相关的关键词的时间出现情况来提取这些关键词。我们旨在利用查询扩展,根据疫情变化提取相关关键词。此外,我们研究与公众对新冠病毒病的关注相关的基于网络的延迟搜索行为,并进行调整以获得更好的预测性能。

方法

为了捕捉与新冠病毒病相关的时间语义,在新闻语料库上训练词嵌入模型,并在4个月的窗口内提取与“冠状病毒”相关的前100个单词。应用时间滞后互相关来从扩展查询中选择与确诊病例相关的最佳时间滞后。随后,在使用时间滞后特征的主成分分析降低特征维度后,训练弹性网络回归模型来预测未来每日病例数。

结果

我们的方法根据疫情阶段成功提取了相关关键词,包括与新冠病毒病直接相关的关键词,如症状及其社会影响。具体而言,在首次爆发期间,与新冠病毒病直接相关以及与过去类似于新冠病毒病的传染病爆发相关的关键词呈现出高度正相关。在疫情的第二阶段,随着社区感染的出现,与政府疫情防控政策相关的关键词经常被观察到具有高度正相关。在疫情的第三阶段,在德尔塔变异株爆发期间,出现了“经济危机”和“焦虑”等关键词,反映了公众的疲惫。因此,在4个月窗口内由提取的查询训练的预测模型在提前1 - 14天的大多数预测中优于先前的方法。值得注意的是,与基于启发式和症状的查询集相比,对于提前9 - 11天的预测,我们的方法显示出比仅基于过去病例数的模型显著更高的皮尔逊相关系数(P = 0.02,P < 0.01,P < 0.01)。

结论

本研究提出了一种新颖的新冠病毒病病例预测模型,该模型使用词嵌入随时间自动提取相关查询。该模型优于以往依赖基于静态症状或启发式查询的方法,甚至无需先验专家知识。结果证明了我们的方法能够追踪公众对疫情变化的关注随时间的转移。

相似文献

1
Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study.利用搜索引擎查询发现韩国新冠肺炎病例预测中的时变公众兴趣:信息流行病学研究
J Med Internet Res. 2024 Dec 16;26:e63476. doi: 10.2196/63476.
2
Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.利用搜索引擎查询数据和社交媒体数据估算韩国的流感疫情
J Med Internet Res. 2016 Jul 4;18(7):e177. doi: 10.2196/jmir.4955.
3
Association of Search Query Interest in Gastrointestinal Symptoms With COVID-19 Diagnosis in the United States: Infodemiology Study.美国胃肠道症状搜索查询兴趣与 COVID-19 诊断的关联:信息流行病学研究。
JMIR Public Health Surveill. 2020 Jul 17;6(3):e19354. doi: 10.2196/19354.
4
Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study.利用 2020 年至 2021 年韩国搜索引擎查询数据预测每日新增 COVID-19 病例和死亡人数:信息流行病学研究。
J Med Internet Res. 2021 Dec 22;23(12):e34178. doi: 10.2196/34178.
5
Understanding the Community Risk Perceptions of the COVID-19 Outbreak in South Korea: Infodemiology Study.了解韩国新冠疫情的社区风险认知:信息流行病学研究
J Med Internet Res. 2020 Sep 29;22(9):e19788. doi: 10.2196/19788.
6
Public Interest in Immunity and the Justification for Intervention in the Early Stages of the COVID-19 Pandemic: Analysis of Google Trends Data.公众对免疫的关注和在 COVID-19 大流行早期阶段进行干预的正当性:谷歌趋势数据分析。
J Med Internet Res. 2021 Jun 18;23(6):e26368. doi: 10.2196/26368.
7
Effective Training Data Extraction Method to Improve Influenza Outbreak Prediction from Online News Articles: Deep Learning Model Study.提高基于在线新闻文章的流感爆发预测的有效训练数据提取方法:深度学习模型研究
JMIR Med Inform. 2021 May 25;9(5):e23305. doi: 10.2196/23305.
8
Mapping of Health Literacy and Social Panic Via Web Search Data During the COVID-19 Public Health Emergency: Infodemiological Study.新冠疫情公共卫生紧急事件期间通过网络搜索数据对健康素养与社会恐慌的映射:信息流行病学研究
J Med Internet Res. 2020 Jul 2;22(7):e18831. doi: 10.2196/18831.
9
Association Between What People Learned About COVID-19 Using Web Searches and Their Behavior Toward Public Health Guidelines: Empirical Infodemiology Study.人们通过网络搜索了解 COVID-19 与他们对公共卫生指南行为之间的关联:实证信息流行病学研究。
J Med Internet Res. 2021 Sep 2;23(9):e28975. doi: 10.2196/28975.
10
An ensemble approach improves the prediction of the COVID-19 pandemic in South Korea.一种集成方法改进了韩国新冠疫情的预测。
J Glob Health. 2025 Mar 28;15:04079. doi: 10.7189/jogh.15.04079.

本文引用的文献

1
Challenges of COVID-19 Case Forecasting in the US, 2020-2021.2020-2021 年美国新冠肺炎病例预测面临的挑战。
PLoS Comput Biol. 2024 May 6;20(5):e1011200. doi: 10.1371/journal.pcbi.1011200. eCollection 2024 May.
2
Change in Severity and Clinical Manifestation of MIS-C Over SARS-CoV-2 Variant Outbreaks in Korea.韩国在 SARS-CoV-2 变异株流行期间,川崎病样疾病严重程度和临床表现的变化。
J Korean Med Sci. 2023 Jul 31;38(30):e225. doi: 10.3346/jkms.2023.38.e225.
3
Identifying susceptibility of children and adolescents to the Omicron variant (B.1.1.529).
识别儿童和青少年对奥密克戎变异株(B.1.1.529)的易感性。
BMC Med. 2022 Nov 23;20(1):451. doi: 10.1186/s12916-022-02655-z.
4
COVID-19 forecasts using Internet search information in the United States.利用美国互联网搜索信息预测 COVID-19。
Sci Rep. 2022 Jul 7;12(1):11539. doi: 10.1038/s41598-022-15478-y.
5
COVID-19 hospitalizations forecasts using internet search data.使用互联网搜索数据预测 COVID-19 住院情况。
Sci Rep. 2022 Jun 11;12(1):9661. doi: 10.1038/s41598-022-13162-9.
6
Features of COVID-19 Among Children and Adolescents Without Risk Factors Before and After the Delta Variant Outbreak in South Korea.韩国德尔塔变异株爆发前后无风险因素的儿童和青少年中的新冠病毒病特征
Pediatr Infect Dis J. 2022 Jan 1;41(1):e34-e35. doi: 10.1097/INF.0000000000003394.
7
Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study.利用 2020 年至 2021 年韩国搜索引擎查询数据预测每日新增 COVID-19 病例和死亡人数:信息流行病学研究。
J Med Internet Res. 2021 Dec 22;23(12):e34178. doi: 10.2196/34178.
8
Hybrid deep learning of social media big data for predicting the evolution of COVID-19 transmission.用于预测COVID-19传播演变的社交媒体大数据混合深度学习
Knowl Based Syst. 2021 Dec 5;233:107417. doi: 10.1016/j.knosys.2021.107417. Epub 2021 Aug 24.
9
Application of machine learning in the prediction of COVID-19 daily new cases: A scoping review.机器学习在预测新型冠状病毒肺炎每日新增病例中的应用:一项范围综述
Heliyon. 2021 Oct;7(10):e08143. doi: 10.1016/j.heliyon.2021.e08143. Epub 2021 Oct 11.
10
The relationship between Google search interest for pulmonary symptoms and COVID-19 cases using dynamic conditional correlation analysis.使用动态条件相关分析研究肺部症状的谷歌搜索量与 COVID-19 病例之间的关系。
Sci Rep. 2021 Jul 13;11(1):14387. doi: 10.1038/s41598-021-93836-y.