加拿大、美国和欧洲的长期新冠疫情相关话语:推特数据的主题建模与情感分析
Long COVID Discourse in Canada, the United States, and Europe: Topic Modeling and Sentiment Analysis of Twitter Data.
作者信息
AbuRaed Ahmed Ghassan Tawfiq, Prikryl Emil Azuma, Carenini Giuseppe, Janjua Naveed Zafar
机构信息
Department of Computer Science, The University of British Colombia, Vancouver, BC, Canada.
NOSM University, Thunder Bay, BC, Canada.
出版信息
J Med Internet Res. 2024 Dec 9;26:e59425. doi: 10.2196/59425.
BACKGROUND
Social media serves as a vast repository of data, offering insights into public perceptions and emotions surrounding significant societal issues. Amid the COVID-19 pandemic, long COVID (formally known as post-COVID-19 condition) has emerged as a chronic health condition, profoundly impacting numerous lives and livelihoods. Given the dynamic nature of long COVID and our evolving understanding of it, effectively capturing people's sentiments and perceptions through social media becomes increasingly crucial. By harnessing the wealth of data available on social platforms, we can better track the evolving narrative surrounding long COVID and the collective efforts to address this pressing issue.
OBJECTIVE
This study aimed to investigate people's perceptions and sentiments around long COVID in Canada, the United States, and Europe, by analyzing English-language tweets from these regions using advanced topic modeling and sentiment analysis techniques. Understanding regional differences in public discourse can inform tailored public health strategies.
METHODS
We analyzed long COVID-related tweets from 2021. Contextualized topic modeling was used to capture word meanings in context, providing coherent and semantically meaningful topics. Sentiment analysis was conducted in a zero-shot manner using Llama 2, a large language model, to classify tweets into positive, negative, or neutral sentiments. The results were interpreted in collaboration with public health experts, comparing the timelines of topics discussed across the 3 regions. This dual approach enabled a comprehensive understanding of the public discourse surrounding long COVID. We used metrics such as normalized pointwise mutual information for coherence and topic diversity for diversity to ensure robust topic modeling results.
RESULTS
Topic modeling identified five main topics: (1) long COVID in people including children in the context of vaccination, (2) duration and suffering associated with long COVID, (3) persistent symptoms of long COVID, (4) the need for research on long COVID treatment, and (5) measuring long COVID symptoms. Significant concern was noted across all regions about the duration and suffering associated with long COVID, along with consistent discussions on persistent symptoms and calls for more research and better treatments. In particular, the topic of persistent symptoms was highly prevalent, reflecting ongoing challenges faced by individuals with long COVID. Sentiment analysis showed a mix of positive and negative sentiments, fluctuating with significant events and news related to long COVID.
CONCLUSIONS
Our study combines natural language processing techniques, including contextualized topic modeling and sentiment analysis, along with domain expert input, to provide detailed insights into public health monitoring and intervention. These findings highlight the importance of tracking public discourse on long COVID to inform public health strategies, address misinformation, and provide support to affected individuals. The use of social media analysis in understanding public health issues is underscored, emphasizing the role of emerging technologies in enhancing public health responses.
背景
社交媒体是一个庞大的数据宝库,能让我们深入了解公众对重大社会问题的看法和情绪。在新冠疫情期间,长期新冠(正式名称为新冠后状况)已成为一种慢性健康状况,对众多人的生活和生计产生了深远影响。鉴于长期新冠的动态性质以及我们对它的不断演变的理解,通过社交媒体有效捕捉人们的情绪和看法变得越来越重要。通过利用社交平台上丰富的数据,我们可以更好地追踪围绕长期新冠的不断演变的叙述以及为解决这一紧迫问题所做的集体努力。
目的
本研究旨在通过使用先进的主题建模和情感分析技术分析来自加拿大、美国和欧洲的英语推文,调查这些地区人们对长期新冠的看法和情绪。了解公共话语中的地区差异可为量身定制的公共卫生策略提供信息。
方法
我们分析了2021年与长期新冠相关的推文。情境化主题建模用于在上下文中捕捉单词含义,提供连贯且语义有意义的主题。使用大型语言模型Llama 2以零样本方式进行情感分析,将推文分类为积极、消极或中性情绪。研究结果与公共卫生专家合作进行解读,比较了三个地区讨论的主题时间线。这种双重方法使我们能够全面了解围绕长期新冠的公共话语。我们使用归一化逐点互信息等指标来衡量连贯性,使用主题多样性指标来衡量多样性,以确保主题建模结果的稳健性。
结果
主题建模确定了五个主要主题:(1)包括儿童在内的人群在疫苗接种背景下的长期新冠,(2)与长期新冠相关的持续时间和痛苦,(3)长期新冠的持续症状,(4)对长期新冠治疗进行研究的必要性,(5)测量长期新冠症状。所有地区都对与长期新冠相关的持续时间和痛苦表示了极大关注,同时也持续讨论了持续症状,并呼吁进行更多研究和更好的治疗。特别是,持续症状这一主题非常普遍,反映了长期新冠患者面临的持续挑战。情感分析显示出积极和消极情绪的混合,随着与长期新冠相关的重大事件和新闻而波动。
结论
我们的研究结合了自然语言处理技术,包括情境化主题建模和情感分析,并结合了领域专家的意见,以提供对公共卫生监测和干预的详细见解。这些发现凸显了追踪关于长期新冠的公共话语以指导公共卫生策略、解决错误信息并为受影响个人提供支持的重要性。强调了利用社交媒体分析来理解公共卫生问题,突出了新兴技术在加强公共卫生应对方面的作用。