• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自然语言处理和机器学习,通过社交媒体视角监测新冠疫情。

Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning.

作者信息

Liu Yang, Whitfield Christopher, Zhang Tianyang, Hauser Amanda, Reynolds Taeyonn, Anwar Mohd

机构信息

Human-Centered AI (HC-AI) Lab, North Carolina A&T State University, Greensboro, NC 27411 USA.

University of Massachusetts Amherst, Amherst, MA 01003 USA.

出版信息

Health Inf Sci Syst. 2021 Jun 25;9(1):25. doi: 10.1007/s13755-021-00158-4. eCollection 2021 Dec.

DOI:10.1007/s13755-021-00158-4
PMID:34188896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8226148/
Abstract

PURPOSE

It has been over a year since the first known case of coronavirus disease (COVID-19) emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore "" and "". The purpose of this study was to compare people's thoughts, behavior changes, discussion topics, and the number of confirmed cases and deaths by applying natural language processing (NLP) to COVID-19 related data.

METHODS

In this study, we collected COVID-19 related data from 18 subreddits of North Carolina from March to August 2020. Next, we applied methods from natural language processing and machine learning to analyze collected Reddit posts using feature engineering, topic modeling, custom named-entity recognition (NER), and BERT-based (Bidirectional Encoder Representations from Transformers) sentence clustering. Using these methods, we were able to glean people's responses and their concerns about COVID-19 pandemic in North Carolina.

RESULTS

We observed a positive change in attitudes towards masks for residents in North Carolina. The high-frequency words in all subreddit corpora for each of the COVID-19 mitigation strategy categories are: Distancing (DIST)-"", "", and ""; Disinfection (DIT)-"", "", and ""; Personal Protective Equipment (PPE)-"", "", and ""; Symptoms (SYM)-"", "", and ""; Testing (TEST)-"", "( "".

CONCLUSION

The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community.

摘要

目的

自首例已知的冠状病毒病(COVID-19)病例出现以来已过去一年多,但大流行远未结束。迄今为止,冠状病毒大流行已在全球感染了超过8000万人,并导致超过178万人死亡。本研究旨在探索“”和“”。本研究的目的是通过对与COVID-19相关的数据应用自然语言处理(NLP)来比较人们的想法、行为变化、讨论话题以及确诊病例数和死亡人数。

方法

在本研究中,我们收集了2020年3月至8月来自北卡罗来纳州18个Reddit社区的与COVID-19相关的数据。接下来,我们应用自然语言处理和机器学习方法,通过特征工程、主题建模、自定义命名实体识别(NER)以及基于BERT(来自Transformer的双向编码器表示)的句子聚类来分析收集到的Reddit帖子。使用这些方法,我们能够了解北卡罗来纳州人们对COVID-19大流行的反应及其担忧。

结果

我们观察到北卡罗来纳州居民对口罩的态度有积极变化。每个COVID-19缓解策略类别的所有Reddit语料库中的高频词分别为:社交距离(DIST)——“”、“”和“”;消毒(DIT)——“”、“”和“”;个人防护装备(PPE)——“”、“”和“”;症状(SYM)——“”、“”和“”;检测(TEST)——“”、“( ”。

结论

我们研究中的发现表明,使用Reddit数据监测北卡罗来纳州(NC)的COVID-19大流行是有效的。该研究展示了NLP方法(如余弦相似度、潜在狄利克雷分配(LDA)主题建模、自定义NER和基于BERT的句子聚类)在利用Reddit数据发现北卡罗来纳州COVID-19大流行期间公众担忧/行为变化方面的效用。此外,结果表明社交媒体数据可用于监测特定社区的疫情情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/195878bab34d/13755_2021_158_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/b606c3ec2ac0/13755_2021_158_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/c274ec774d77/13755_2021_158_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/6dafb4407b6f/13755_2021_158_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/195878bab34d/13755_2021_158_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/b606c3ec2ac0/13755_2021_158_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/c274ec774d77/13755_2021_158_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/6dafb4407b6f/13755_2021_158_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68bb/8227921/195878bab34d/13755_2021_158_Fig4_HTML.jpg

相似文献

1
Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning.利用自然语言处理和机器学习,通过社交媒体视角监测新冠疫情。
Health Inf Sci Syst. 2021 Jun 25;9(1):25. doi: 10.1007/s13755-021-00158-4. eCollection 2021 Dec.
2
Impact of COVID-19 Pandemic on Social Determinants of Health Issues of Marginalized Black and Asian Communities: A Social Media Analysis Empowered by Natural Language Processing.2019年冠状病毒病疫情对边缘化黑人和亚裔社区健康问题的社会决定因素的影响:基于自然语言处理的社交媒体分析
J Racial Ethn Health Disparities. 2025 Jun;12(3):1641-1656. doi: 10.1007/s40615-024-01996-0. Epub 2024 Apr 16.
3
Perspectives of the COVID-19 Pandemic on Reddit: Comparative Natural Language Processing Study of the United States, the United Kingdom, Canada, and Australia.Reddit上关于新冠疫情的观点:美国、英国、加拿大和澳大利亚的比较自然语言处理研究
JMIR Infodemiology. 2022 Sep 27;2(2):e36941. doi: 10.2196/36941. eCollection 2022 Jul-Dec.
4
Sexually Transmitted Disease-Related Reddit Posts During the COVID-19 Pandemic: Latent Dirichlet Allocation Analysis.COVID-19 大流行期间与性传播疾病相关的 Reddit 帖子:潜在狄利克雷分配分析。
J Med Internet Res. 2022 Oct 31;24(10):e37258. doi: 10.2196/37258.
5
Exploring COVID-19-Related Stressors: Topic Modeling Study.探讨与 COVID-19 相关应激源:主题建模研究。
J Med Internet Res. 2022 Jul 13;24(7):e37142. doi: 10.2196/37142.
6
User Dynamics and Thematic Exploration in r/Depression During the COVID-19 Pandemic: Insights From Overlapping r/SuicideWatch Users.新冠疫情期间 r/Depression 中的用户动态和主题探索:来自重叠 r/SuicideWatch 用户的洞察。
J Med Internet Res. 2024 May 20;26:e53968. doi: 10.2196/53968.
7
Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit Posts.理解不同社交网络服务领域的心理健康问题:基于文本的 Reddit 帖子的计算分析。
J Med Internet Res. 2023 Nov 30;25:e49074. doi: 10.2196/49074.
8
Managing Type 2 Diabetes During the COVID-19 Pandemic: Scoping Review and Qualitative Study Using Systematic Literature Review and Reddit.2019冠状病毒病大流行期间2型糖尿病的管理:使用系统文献综述和Reddit进行的范围综述和定性研究
Interact J Med Res. 2024 Aug 8;13:e49073. doi: 10.2196/49073.
9
An integrated clustering and BERT framework for improved topic modeling.一种用于改进主题建模的集成聚类和BERT框架。
Int J Inf Technol. 2023;15(4):2187-2195. doi: 10.1007/s41870-023-01268-w. Epub 2023 May 6.
10
Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data.新冠疫情期间中国社交媒体用户表达的担忧:对新浪微博数据的内容分析
J Med Internet Res. 2020 Nov 26;22(11):e22152. doi: 10.2196/22152.

引用本文的文献

1
AI Methods Tailored to Influenza, RSV, HIV, and SARS-CoV-2: A Focused Review.针对流感、呼吸道合胞病毒、艾滋病毒和新型冠状病毒2的人工智能方法:重点综述
Pathogens. 2025 Jul 30;14(8):748. doi: 10.3390/pathogens14080748.
2
Performance Improvement of a Natural Language Processing Tool for Extracting Patient Narratives Related to Medical States From Japanese Pharmaceutical Care Records by Increasing the Amount of Training Data: Natural Language Processing Analysis and Validation Study.通过增加训练数据量提高从日本药学服务记录中提取与医疗状况相关患者叙述的自然语言处理工具的性能:自然语言处理分析与验证研究
JMIR Med Inform. 2025 Mar 4;13:e68863. doi: 10.2196/68863.
3

本文引用的文献

1
Addressing immediate public coronavirus (COVID-19) concerns through social media: Utilizing Reddit's AMA as a framework for Public Engagement with Science.通过社交媒体解决当前公众对冠状病毒(COVID-19)的担忧:利用 Reddit 的 AMA 作为与公众进行科学互动的框架。
PLoS One. 2020 Oct 6;15(10):e0240326. doi: 10.1371/journal.pone.0240326. eCollection 2020.
2
Collective Response to Media Coverage of the COVID-19 Pandemic on Reddit and Wikipedia: Mixed-Methods Analysis.Reddit和维基百科上对新冠疫情媒体报道的集体回应:混合方法分析
J Med Internet Res. 2020 Oct 12;22(10):e21597. doi: 10.2196/21597.
3
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study.
Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation.
针对新冠肺炎与卫生新闻信息提取的卫生人力资源:算法开发与验证
JMIR AI. 2024 Oct 30;3:e55059. doi: 10.2196/55059.
4
Impact of COVID-19 Pandemic on Social Determinants of Health Issues of Marginalized Black and Asian Communities: A Social Media Analysis Empowered by Natural Language Processing.2019年冠状病毒病疫情对边缘化黑人和亚裔社区健康问题的社会决定因素的影响:基于自然语言处理的社交媒体分析
J Racial Ethn Health Disparities. 2025 Jun;12(3):1641-1656. doi: 10.1007/s40615-024-01996-0. Epub 2024 Apr 16.
5
Artificial intelligence-based framework to identify the abnormalities in the COVID-19 disease and other common respiratory diseases from digital stethoscope data using deep CNN.基于人工智能的框架,利用深度卷积神经网络从数字听诊器数据中识别新冠肺炎及其他常见呼吸道疾病的异常情况。
Health Inf Sci Syst. 2024 Mar 9;12(1):22. doi: 10.1007/s13755-024-00283-w. eCollection 2024 Dec.
6
Changes to Public Health Surveillance Methods Due to the COVID-19 Pandemic: Scoping Review.因 COVID-19 大流行而改变的公共卫生监测方法:范围综述。
JMIR Public Health Surveill. 2024 Jan 19;10:e49185. doi: 10.2196/49185.
7
Emotional Expression on Social Media Support Forums for Substance Cessation: Observational Study of Text-Based Reddit Posts.社交媒体戒瘾支持论坛上的情绪表达:基于文本的 Reddit 帖子的观察性研究。
J Med Internet Res. 2023 Jul 19;25:e45267. doi: 10.2196/45267.
8
Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada.加拿大萨斯喀彻温省使用社交媒体文本数据对基于预训练变压器的流感和新冠病毒检测模型的比较
Front Digit Health. 2023 Jun 28;5:1203874. doi: 10.3389/fdgth.2023.1203874. eCollection 2023.
9
Deep Learning Model for COVID-19 Sentiment Analysis on Twitter.用于推特上新冠疫情情感分析的深度学习模型
New Gener Comput. 2023;41(2):189-212. doi: 10.1007/s00354-023-00209-2. Epub 2023 Mar 13.
10
Modeling approaches for early warning and monitoring of pandemic situations as well as decision support.建模方法,用于对大流行情况进行预警和监测以及提供决策支持。
Front Public Health. 2022 Nov 14;10:994949. doi: 10.3389/fpubh.2022.994949. eCollection 2022.
自然语言处理揭示了新冠疫情期间Reddit上脆弱的心理健康支持小组以及加剧的健康焦虑:一项观察性研究。
J Med Internet Res. 2020 Oct 12;22(10):e22635. doi: 10.2196/22635.
4
Public Priorities and Concerns Regarding COVID-19 in an Online Discussion Forum: Longitudinal Topic Modeling.在线讨论论坛中关于新冠病毒病的公众优先事项与关切:纵向主题建模
J Gen Intern Med. 2020 Jul;35(7):2244-2247. doi: 10.1007/s11606-020-05889-w. Epub 2020 May 12.
5
Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data.利用自然语言处理通过社交媒体和消费者生成的数据解决公共卫生研究问题的最新进展
Yearb Med Inform. 2019 Aug;28(1):208-217. doi: 10.1055/s-0039-1677918. Epub 2019 Aug 16.
6
Tracking Health Related Discussions on Reddit for Public Health Applications.在Reddit上追踪与健康相关的讨论以用于公共卫生应用。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1362-1371. eCollection 2017.
7
Social media and internet-based data in global systems for public health surveillance: a systematic review.社交媒体和基于互联网的数据在全球公共卫生监测系统中的应用:系统评价。
Milbank Q. 2014 Mar;92(1):7-33. doi: 10.1111/1468-0009.12038.