Suppr超能文献

分析美国与H5N1相关的Reddit社交媒体内容:情感与主题建模研究。

Analyzing Reddit Social Media Content in the United States Related to H5N1: Sentiment and Topic Modeling Study.

作者信息

Pang Oscar, Movahedi Nia Zahra, Gillies Murray, Leung Doris, Bragazzi Nicola, Gizo Itlala, Kong Jude Dzevela

机构信息

Artificial Intelligence and Mathematical Modeling Lab, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

Department of Computer Science, University of Toronto, Toronto, ON, Canada.

出版信息

J Med Internet Res. 2025 Sep 9;27:e70746. doi: 10.2196/70746.

Abstract

BACKGROUND

The H5N1 avian influenza A virus represents a serious threat to both animal and human health, with the potential to escalate into a global pandemic. Effective monitoring of social media during H5N1 avian influenza outbreaks could potentially offer critical insights to guide public health strategies. Social media platforms like Reddit, with their diverse and region-specific communities, provide a rich source of data that can reveal collective attitudes, concerns, and behavioral trends in real time.

OBJECTIVE

This study aims to analyze Reddit comments from state-specific subreddits in the United States from the most recent outbreak period of 2022 to 2024 to (1) assess the sentiments expressed as the H5N1 outbreak progresses; (2) identify predominant topics discussed, particularly those corresponding to negative sentiments; and (3) explore correlations between these sentiments or topics and the severity and spread of the outbreak in respective regions.

METHODS

We collected 2152 Reddit comments from 160 subreddits across 11 highly impacted states from February 2022 to July 2024. Outbreak data comprising almost 600 entries were obtained from the US Department of Agriculture database. Sentiment classification was performed using a fine-tuned Bidirectional Encoder Representations From Transformers (BERT) base model, and comments were categorized into 6 emotions: anger, fear, joy, love, sadness, and surprise, with a seventh "neutral" category added for low-confidence classifications. Topic modeling was conducted using BERTopic and latent Dirichlet allocation models. Statistical analyses included calculating correlations between sentiment intensity and outbreak severity levels and applying the Mann-Whitney U test to assess differences between sentiment categories.

RESULTS

The findings illustrate that H5N1 unfolded in mostly discrete national waves and that only a subset of states-Minnesota and Iowa-experienced chronic, multiwave exposure, a pattern obscured in national aggregates. Sentiment intensity scoring revealed that although 90% (n=1931) of discourse was negative, emotions differed in how they tracked the epidemic: fear aligned weekly with real-time case counts (r=0.11), whereas anger, sadness, and even joy surged 3 weeks after the outbreak (r=0.20-0.24 after the lag was considered). When both the 3-week lag and an outlier month in terms of outbreak cases were adjusted for simultaneously, those associations strengthened further (overall r=0.223), showing how delayed reactions and anomalous surges can mask true sentiment-epidemiology links if left uncorrected. This defines the window in which risk communicators can pre-empt misinformation and economic anxiety. Topic modeling uncovered recurring themes of concern: avian flu culling, sharp egg-price hikes, and frustration over prolonged biosecurity measures. BERTopic provided more coherent and locally specific topics than latent Dirichlet allocation.

CONCLUSIONS

Overall, these results underscore the critical role of social media analysis in understanding public reactions, including prevalent themes and sentiments, and guiding timely, targeted public health interventions during the H5N1 outbreak.

摘要

背景

H5N1甲型禽流感病毒对动物和人类健康都构成严重威胁,有可能演变成全球大流行。在H5N1禽流感疫情爆发期间对社交媒体进行有效监测,有可能为指导公共卫生策略提供关键见解。像Reddit这样的社交媒体平台,拥有多样化且特定地区的社区,提供了丰富的数据来源,可以实时揭示集体态度、担忧和行为趋势。

目的

本研究旨在分析2022年至2024年最近疫情爆发期间美国特定州的Reddit评论,以(1)评估随着H5N1疫情的发展所表达的情绪;(2)识别讨论的主要话题,特别是那些与负面情绪相对应的话题;(3)探索这些情绪或话题与各地区疫情的严重程度和传播之间的相关性。

方法

我们从2022年2月至2024年7月期间,从11个受影响严重的州的160个分版块收集了2152条Reddit评论。从美国农业部数据库获得了包含近600条记录的疫情数据。使用微调后的基于变换器的双向编码器表征(BERT)基础模型进行情感分类,评论被分为6种情绪:愤怒、恐惧、喜悦、爱、悲伤和惊讶,另外增加了第七个“中性”类别用于低置信度分类。使用BERTopic和潜在狄利克雷分配模型进行主题建模。统计分析包括计算情感强度与疫情严重程度水平之间的相关性,并应用曼-惠特尼U检验来评估情感类别之间的差异。

结果

研究结果表明,H5N1疫情大多以分散的全国性浪潮形式展开,只有明尼苏达州和爱荷华州等部分州经历了长期的多波疫情,这种模式在全国汇总数据中被掩盖。情感强度评分显示,虽然90%(n = 1931)的讨论是负面的,但不同情绪跟踪疫情的方式有所不同:恐惧每周与实时病例数相关(r = 0.11),而愤怒、悲伤甚至喜悦在疫情爆发后3周激增(考虑滞后因素后r = 0.20 - 0.24)。当同时调整3周的滞后和疫情病例方面的一个异常月份时,这些关联进一步加强(总体r = 0.223),这表明如果不加以纠正,延迟反应和异常激增如何掩盖真实的情感与流行病学联系。这确定了风险沟通者可以预先阻止错误信息和经济焦虑的窗口期。主题建模揭示了反复出现的关注主题:禽流感扑杀、鸡蛋价格大幅上涨以及对长期生物安全措施的不满。与潜在狄利克雷分配相比,BERTopic提供了更连贯且更具本地特色的主题。

结论

总体而言,这些结果强调了社交媒体分析在理解公众反应(包括普遍主题和情绪)以及指导H5N1疫情爆发期间及时、有针对性的公共卫生干预方面的关键作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be67/12457856/452be839b150/jmir_v27i1e70746_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验