Suppr超能文献

基于机器学习的方法在推特上检测与 COVID-19 相关的自我报告症状、检测途径和康复情况:回顾性大数据信息监测研究。

Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study.

机构信息

Department of Anesthesiology and Division of Global Public Health and Infectious Diseases, School of Medicine, University of California San Diego, La Jolla, CA, United States.

Global Health Policy Institute, San Diego, CA, United States.

出版信息

JMIR Public Health Surveill. 2020 Jun 8;6(2):e19509. doi: 10.2196/19509.

Abstract

BACKGROUND

The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries.

OBJECTIVE

The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach.

METHODS

Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics.

RESULTS

A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States.

CONCLUSIONS

This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.

摘要

背景

自 2020 年 6 月初以来,全球已报告超过 600 万例冠状病毒病(COVID-19)病例,这是一场全球性的卫生紧急事件。鉴于其在日益数字化的时代出现,此次大流行在规模和先例方面都是历史性的。重要的是,由于缺乏检测机会和衡量康复情况的困难,人们对 COVID-19 病例数的准确性表示担忧。

目的

本研究旨在使用无监督机器学习方法,检测和描述与 COVID-19 相关症状、检测机会体验以及疾病康复相关的用户生成对话。

方法

从 2020 年 3 月 3 日至 20 日,从 Twitter 的公共流媒体应用程序编程接口中收集推文,根据与 COVID-19 相关的一般关键词进行过滤,然后根据用户报告的与 COVID-19 症状相关的术语进行进一步过滤。使用一种名为双词主题模型(BTM)的无监督机器学习方法分析推文,将包含相同词相关主题的推文分组到主题集群中,其中包括有关症状、检测和康复的对话。然后提取这些集群中的推文并进行手动注释以进行内容分析,并评估其统计和地理特征。

结果

共收集了 4492954 条推文,其中包含可能与 COVID-19 症状相关的术语。使用 BTM 识别相关主题集群并删除重复的推文后,我们共确定了 3465 条(<1%)推文,其中包含用户生成的关于用户可能与 COVID-19 症状相关的体验和其他疾病体验的对话。这些推文被分为五个主要类别,包括第一手和第二手症状报告、症状报告与缺乏检测同时发生、康复讨论、接受检测后确认 COVID-19 阴性诊断,以及用户回忆症状并质疑他们是否以前曾感染过 COVID-19。对于报告症状但缺乏检测和讨论康复的用户,这些主题的推文同时出现具有统计学意义。共有 63%(n=1112)的带地理标记的推文位于美国。

结论

本研究使用无监督机器学习方法来描述与 COVID-19 相关的自我报告症状、检测体验和康复提及。许多用户报告了他们认为与 COVID-19 相关的症状,但他们无法接受检测以确认他们的担忧。在缺乏检测机会和确认的情况下,可能永远无法知道该疫情爆发期间的准确病例估计数。未来的研究应继续探索利用信息监测方法来估计 COVID-19 疾病的严重程度。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验