• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的方法在推特上检测与 COVID-19 相关的自我报告症状、检测途径和康复情况:回顾性大数据信息监测研究。

Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study.

机构信息

Department of Anesthesiology and Division of Global Public Health and Infectious Diseases, School of Medicine, University of California San Diego, La Jolla, CA, United States.

Global Health Policy Institute, San Diego, CA, United States.

出版信息

JMIR Public Health Surveill. 2020 Jun 8;6(2):e19509. doi: 10.2196/19509.

DOI:10.2196/19509
PMID:32490846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7282475/
Abstract

BACKGROUND

The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries.

OBJECTIVE

The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach.

METHODS

Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics.

RESULTS

A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States.

CONCLUSIONS

This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.

摘要

背景

自 2020 年 6 月初以来,全球已报告超过 600 万例冠状病毒病(COVID-19)病例,这是一场全球性的卫生紧急事件。鉴于其在日益数字化的时代出现,此次大流行在规模和先例方面都是历史性的。重要的是,由于缺乏检测机会和衡量康复情况的困难,人们对 COVID-19 病例数的准确性表示担忧。

目的

本研究旨在使用无监督机器学习方法,检测和描述与 COVID-19 相关症状、检测机会体验以及疾病康复相关的用户生成对话。

方法

从 2020 年 3 月 3 日至 20 日,从 Twitter 的公共流媒体应用程序编程接口中收集推文,根据与 COVID-19 相关的一般关键词进行过滤,然后根据用户报告的与 COVID-19 症状相关的术语进行进一步过滤。使用一种名为双词主题模型(BTM)的无监督机器学习方法分析推文,将包含相同词相关主题的推文分组到主题集群中,其中包括有关症状、检测和康复的对话。然后提取这些集群中的推文并进行手动注释以进行内容分析,并评估其统计和地理特征。

结果

共收集了 4492954 条推文,其中包含可能与 COVID-19 症状相关的术语。使用 BTM 识别相关主题集群并删除重复的推文后,我们共确定了 3465 条(<1%)推文,其中包含用户生成的关于用户可能与 COVID-19 症状相关的体验和其他疾病体验的对话。这些推文被分为五个主要类别,包括第一手和第二手症状报告、症状报告与缺乏检测同时发生、康复讨论、接受检测后确认 COVID-19 阴性诊断,以及用户回忆症状并质疑他们是否以前曾感染过 COVID-19。对于报告症状但缺乏检测和讨论康复的用户,这些主题的推文同时出现具有统计学意义。共有 63%(n=1112)的带地理标记的推文位于美国。

结论

本研究使用无监督机器学习方法来描述与 COVID-19 相关的自我报告症状、检测体验和康复提及。许多用户报告了他们认为与 COVID-19 相关的症状,但他们无法接受检测以确认他们的担忧。在缺乏检测机会和确认的情况下,可能永远无法知道该疫情爆发期间的准确病例估计数。未来的研究应继续探索利用信息监测方法来估计 COVID-19 疾病的严重程度。

相似文献

1
Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study.基于机器学习的方法在推特上检测与 COVID-19 相关的自我报告症状、检测途径和康复情况:回顾性大数据信息监测研究。
JMIR Public Health Surveill. 2020 Jun 8;6(2):e19509. doi: 10.2196/19509.
2
Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.大数据、自然语言处理和深度学习技术在检测和识别非法销售 COVID-19 产品中的应用:对 Twitter 和 Instagram 的 Infoveillance 研究。
JMIR Public Health Surveill. 2020 Aug 25;6(3):e20794. doi: 10.2196/20794.
3
Using Reports of Symptoms and Diagnoses on Social Media to Predict COVID-19 Case Counts in Mainland China: Observational Infoveillance Study.利用社交媒体上的症状报告和诊断信息预测中国大陆的新冠肺炎病例数:观察性信息监测研究
J Med Internet Res. 2020 May 28;22(5):e19421. doi: 10.2196/19421.
4
Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study.新冠疫情期间推特用户的主要担忧:信息监测研究
J Med Internet Res. 2020 Apr 21;22(4):e19016. doi: 10.2196/19016.
5
Identification and characterization of tweets related to the 2015 Indiana HIV outbreak: A retrospective infoveillance study.识别和描述与 2015 年印第安纳州 HIV 爆发相关的推文:一项回顾性信息监测研究。
PLoS One. 2020 Aug 26;15(8):e0235150. doi: 10.1371/journal.pone.0235150. eCollection 2020.
6
Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study.关于新冠疫情的推文主题、趋势和情绪:时间信息监测研究
J Med Internet Res. 2020 Oct 23;22(10):e22624. doi: 10.2196/22624.
7
Temporal and Location Variations, and Link Categories for the Dissemination of COVID-19-Related Information on Twitter During the SARS-CoV-2 Outbreak in Europe: Infoveillance Study.欧洲SARS-CoV-2疫情期间推特上新冠疫情相关信息传播的时间和地点变化以及链接类别:信息监测研究
J Med Internet Res. 2020 Aug 28;22(8):e19629. doi: 10.2196/19629.
8
Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence.COVID-19 舆情的社会网络分析:人工智能的应用
J Med Internet Res. 2020 Aug 18;22(8):e22590. doi: 10.2196/22590.
9
COVID-19 and the 5G Conspiracy Theory: Social Network Analysis of Twitter Data.新冠疫情与5G阴谋论:基于推特数据的社交网络分析
J Med Internet Res. 2020 May 6;22(5):e19458. doi: 10.2196/19458.
10
Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach.关于新冠疫情的推特讨论与情绪:机器学习方法
J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.

引用本文的文献

1
Advancing infection profiling under data uncertainty through contagion potential.通过传播潜力在数据不确定性下推进感染特征分析。
PLoS One. 2025 Aug 12;20(8):e0329828. doi: 10.1371/journal.pone.0329828. eCollection 2025.
2
Beyond the Posts: Analyzing Breast Implant Illness Discourse With Natural Language Processing and Deep Learning.超越帖子:使用自然语言处理和深度学习分析隆胸疾病话语
Aesthet Surg J. 2025 Jun 16;45(7):745-752. doi: 10.1093/asj/sjaf047.
3
Data Parameters From Participatory Surveillance Systems in Human, Animal, and Environmental Health From Around the Globe: Descriptive Analysis.

本文引用的文献

1
Global Sentiments Surrounding the COVID-19 Pandemic on Twitter: Analysis of Twitter Trends.全球社交媒体推特上的新冠大流行情绪:推特趋势分析。
JMIR Public Health Surveill. 2020 May 22;6(2):e19447. doi: 10.2196/19447.
2
Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study.新冠疫情早期的中文社交媒体平台微博数据挖掘和内容分析:回顾性观察性信息监测研究。
JMIR Public Health Surveill. 2020 Apr 21;6(2):e18700. doi: 10.2196/18700.
3
Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study.
全球人类、动物和环境卫生参与性监测系统的数据参数:描述性分析
JMIR Public Health Surveill. 2025 Mar 26;11:e55356. doi: 10.2196/55356.
4
Leveraging Large Language Models for Infectious Disease Surveillance-Using a Web Service for Monitoring COVID-19 Patterns From Self-Reporting Tweets: Content Analysis.利用大语言模型进行传染病监测——使用网络服务监测来自自我报告推文的新冠疫情模式:内容分析
J Med Internet Res. 2025 Feb 20;27:e63190. doi: 10.2196/63190.
5
Dissecting the infodemic: An in-depth analysis of COVID-19 misinformation detection on X (formerly Twitter) utilizing machine learning and deep learning techniques.剖析信息疫情:利用机器学习和深度学习技术对X(原推特)上新冠疫情错误信息检测的深入分析。
Heliyon. 2024 Sep 12;10(18):e37760. doi: 10.1016/j.heliyon.2024.e37760. eCollection 2024 Sep 30.
6
COVIDHealth: A novel labeled dataset and machine learning-based web application for classifying COVID-19 discourses on Twitter.COVIDHealth:一个用于对推特上关于COVID-19的言论进行分类的新型标记数据集和基于机器学习的网络应用程序。
Heliyon. 2024 Jul 8;10(14):e34103. doi: 10.1016/j.heliyon.2024.e34103. eCollection 2024 Jul 30.
7
A Novel Approach for the Early Detection of Medical Resource Demand Surges During Health Care Emergencies: Infodemiology Study of Tweets.一种在医疗紧急情况期间早期检测医疗资源需求激增的新方法:推文的信息流行病学研究
JMIR Form Res. 2024 Jan 29;8:e46087. doi: 10.2196/46087.
8
Detecting nuance in conspiracy discourse: Advancing methods in infodemiology and communication science with machine learning and qualitative content coding.检测阴谋话语中的细微差别:用机器学习和定性内容编码推进信息流行病学和传播学方法。
PLoS One. 2023 Dec 20;18(12):e0295414. doi: 10.1371/journal.pone.0295414. eCollection 2023.
9
Detection and Characterization of Web-Based Pediatric COVID-19 Vaccine Discussions and Racial and Ethnic Minority Topics: Retrospective Analysis of Twitter Data.基于网络的儿科COVID-19疫苗讨论及种族和少数族裔话题的检测与特征分析:Twitter数据的回顾性分析
JMIR Pediatr Parent. 2023 Nov 30;6:e48004. doi: 10.2196/48004.
10
Using Social Media to Help Understand Patient-Reported Health Outcomes of Post-COVID-19 Condition: Natural Language Processing Approach.利用社交媒体帮助了解新冠后症状患者报告的健康结果:自然语言处理方法。
J Med Internet Res. 2023 Sep 19;25:e45767. doi: 10.2196/45767.
新冠疫情期间推特用户的主要担忧:信息监测研究
J Med Internet Res. 2020 Apr 21;22(4):e19016. doi: 10.2196/19016.
4
Level of underreporting including underdiagnosis before the first peak of COVID-19 in various countries: Preliminary retrospective results based on wavelets and deterministic modeling.各国在新冠疫情首次高峰之前包括漏诊在内的漏报水平:基于小波和确定性建模的初步回顾性结果。
Infect Control Hosp Epidemiol. 2020 Jul;41(7):857-859. doi: 10.1017/ice.2020.116. Epub 2020 Apr 9.
5
US Public Concerns About the COVID-19 Pandemic From Results of a Survey Given via Social Media.美国公众对社交媒体调查中 COVID-19 大流行的看法。
JAMA Intern Med. 2020 Jul 1;180(7):1020-1022. doi: 10.1001/jamainternmed.2020.1369.
6
COVID-19 Related Misinformation on Social Media: A Qualitative Study from Iran.社交媒体上与新冠疫情相关的错误信息:来自伊朗的一项定性研究。
J Med Internet Res. 2020 Apr 5. doi: 10.2196/18932.
7
Estimates of the severity of coronavirus disease 2019: a model-based analysis.新型冠状病毒疾病 2019 严重程度的估计:基于模型的分析。
Lancet Infect Dis. 2020 Jun;20(6):669-677. doi: 10.1016/S1473-3099(20)30243-7. Epub 2020 Mar 30.
8
Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing.量化 SARS-CoV-2 传播表明数字接触者追踪可控制疫情。
Science. 2020 May 8;368(6491). doi: 10.1126/science.abb6936. Epub 2020 Mar 31.
9
Epidemiological data from the COVID-19 outbreak, real-time case information.新冠疫情流行病学数据,实时病例信息。
Sci Data. 2020 Mar 24;7(1):106. doi: 10.1038/s41597-020-0448-0.
10
Covid-19 fatality is likely overestimated.新冠病毒疾病(Covid-19)的死亡人数可能被高估了。
BMJ. 2020 Mar 20;368:m1113. doi: 10.1136/bmj.m1113.