Information Sciences Institute, University of Southern California, Marina del Rey, CA, United States.
Department of Computer Science, University of Southern California, Los Angeles, CA, United States.
JMIR Public Health Surveill. 2021 Nov 17;7(11):e30642. doi: 10.2196/30642.
False claims about COVID-19 vaccines can undermine public trust in ongoing vaccination campaigns, posing a threat to global public health. Misinformation originating from various sources has been spreading on the web since the beginning of the COVID-19 pandemic. Antivaccine activists have also begun to use platforms such as Twitter to promote their views. To properly understand the phenomenon of vaccine hesitancy through the lens of social media, it is of great importance to gather the relevant data.
In this paper, we describe a data set of Twitter posts and Twitter accounts that publicly exhibit a strong antivaccine stance. The data set is made available to the research community via our AvaxTweets data set GitHub repository. We characterize the collected accounts in terms of prominent hashtags, shared news sources, and most likely political leaning.
We started the ongoing data collection on October 18, 2020, leveraging the Twitter streaming application programming interface (API) to follow a set of specific antivaccine-related keywords. Then, we collected the historical tweets of the set of accounts that engaged in spreading antivaccination narratives between October 2020 and December 2020, leveraging the Academic Track Twitter API. The political leaning of the accounts was estimated by measuring the political bias of the media outlets they shared.
We gathered two curated Twitter data collections and made them publicly available: (1) a streaming keyword-centered data collection with more than 1.8 million tweets, and (2) a historical account-level data collection with more than 135 million tweets. The accounts engaged in the antivaccination narratives lean to the right (conservative) direction of the political spectrum. The vaccine hesitancy is fueled by misinformation originating from websites with already questionable credibility.
The vaccine-related misinformation on social media may exacerbate the levels of vaccine hesitancy, hampering progress toward vaccine-induced herd immunity, and could potentially increase the number of infections related to new COVID-19 variants. For these reasons, understanding vaccine hesitancy through the lens of social media is of paramount importance. Because data access is the first obstacle to attain this goal, we published a data set that can be used in studying antivaccine misinformation on social media and enable a better understanding of vaccine hesitancy.
有关 COVID-19 疫苗的虚假信息可能会破坏公众对正在进行的疫苗接种活动的信任,从而对全球公共卫生构成威胁。自 COVID-19 大流行开始以来,各种来源的错误信息一直在网络上传播。反疫苗活动家也开始利用 Twitter 等平台来宣传他们的观点。为了通过社交媒体视角正确理解疫苗犹豫现象,收集相关数据非常重要。
本文描述了一个包含公开表达强烈反疫苗立场的 Twitter 帖子和 Twitter 账户的数据集。该数据集通过我们的 AvaxTweets 数据集 GitHub 存储库向研究界提供。我们根据突出的标签、共享的新闻来源以及最可能的政治倾向来描述收集到的账户。
我们于 2020 年 10 月 18 日开始进行持续的数据收集,利用 Twitter 流媒体应用程序编程接口(API)来关注一组特定的与反疫苗相关的关键词。然后,我们利用学术追踪 Twitter API 收集了在 2020 年 10 月至 12 月期间传播反疫苗叙述的一组账户的历史推文。通过测量他们分享的媒体机构的政治偏见来估计账户的政治倾向。
我们收集了两个经过策划的 Twitter 数据集并公开发布:(1)一个包含超过 180 万条推文的基于流媒体关键词的数据集;(2)一个包含超过 1.35 亿条推文的历史账户级数据集。参与反疫苗叙述的账户倾向于政治光谱的右翼(保守)方向。疫苗犹豫情绪是由来自可信度已经存在疑问的网站的错误信息所助长的。
社交媒体上与疫苗相关的错误信息可能会加剧疫苗犹豫情绪,阻碍疫苗诱导的群体免疫的进展,并可能增加与新的 COVID-19 变体相关的感染人数。出于这些原因,通过社交媒体视角理解疫苗犹豫情绪至关重要。由于数据访问是实现这一目标的首要障碍,因此我们发布了一个数据集,可用于研究社交媒体上的反疫苗错误信息,并帮助更好地理解疫苗犹豫情绪。