The Institute for Artificial Intelligence Research and Development of Serbia, Novi Sad, Serbia.
Faculty of Computer Science and Mathematics, University of Passau, Passau, Germany.
J Med Internet Res. 2022 Nov 17;24(11):e42261. doi: 10.2196/42261.
Since the first COVID-19 vaccine appeared, there has been a growing tendency to automatically determine public attitudes toward it. In particular, it was important to find the reasons for vaccine hesitancy, since it was directly correlated with pandemic protraction. Natural language processing (NLP) and public health researchers have turned to social media (eg, Twitter, Reddit, and Facebook) for user-created content from which they can gauge public opinion on vaccination. To automatically process such content, they use a number of NLP techniques, most notably topic modeling. Topic modeling enables the automatic uncovering and grouping of hidden topics in the text. When applied to content that expresses a negative sentiment toward vaccination, it can give direct insight into the reasons for vaccine hesitancy.
This study applies NLP methods to classify vaccination-related tweets by sentiment polarity and uncover the reasons for vaccine hesitancy among the negative tweets in the Serbian language.
To study the attitudes and beliefs behind vaccine hesitancy, we collected 2 batches of tweets that mention some aspects of COVID-19 vaccination. The first batch of 8817 tweets was manually annotated as either relevant or irrelevant regarding the COVID-19 vaccination sentiment, and then the relevant tweets were annotated as positive, negative, or neutral. We used the annotated tweets to train a sequential bidirectional encoder representations from transformers (BERT)-based classifier for 2 tweet classification tasks to augment this initial data set. The first classifier distinguished between relevant and irrelevant tweets. The second classifier used the relevant tweets and classified them as negative, positive, or neutral. This sequential classifier was used to annotate the second batch of tweets. The combined data sets resulted in 3286 tweets with a negative sentiment: 1770 (53.9%) from the manually annotated data set and 1516 (46.1%) as a result of automatic classification. Topic modeling methods (latent Dirichlet allocation [LDA] and nonnegative matrix factorization [NMF]) were applied using the 3286 preprocessed tweets to detect the reasons for vaccine hesitancy.
The relevance classifier achieved an F-score of 0.91 and 0.96 for relevant and irrelevant tweets, respectively. The sentiment polarity classifier achieved an F-score of 0.87, 0.85, and 0.85 for negative, neutral, and positive sentiments, respectively. By summarizing the topics obtained in both models, we extracted 5 main groups of reasons for vaccine hesitancy: concern over vaccine side effects, concern over vaccine effectiveness, concern over insufficiently tested vaccines, mistrust of authorities, and conspiracy theories.
This paper presents a combination of NLP methods applied to find the reasons for vaccine hesitancy in Serbia. Given these reasons, it is now possible to better understand the concerns of people regarding the vaccination process.
自首款 COVID-19 疫苗问世以来,人们越来越倾向于自动判断公众对其的态度。特别是,找出疫苗犹豫的原因很重要,因为这与大流行的持续时间直接相关。自然语言处理(NLP)和公共卫生研究人员已经转向社交媒体(例如 Twitter、Reddit 和 Facebook),从用户生成的内容中了解公众对疫苗接种的看法。为了自动处理此类内容,他们使用了许多 NLP 技术,尤其是主题建模。主题建模可实现对文本中隐藏主题的自动发现和分组。当应用于表达对疫苗接种负面情绪的内容时,它可以直接洞察疫苗犹豫的原因。
本研究将 NLP 方法应用于塞尔维亚语的疫苗相关推文的情感极性分类,并揭示负面推文中疫苗犹豫的原因。
为了研究疫苗犹豫背后的态度和信念,我们收集了两批提及 COVID-19 疫苗接种某些方面的推文。第一批 8817 条推文被手动标记为与 COVID-19 疫苗接种情绪相关或不相关,然后将相关推文标记为正面、负面或中性。我们使用标记的推文来训练基于变压器的序列双向编码器表示(BERT)分类器,以增强此初始数据集。第一个分类器区分相关和不相关的推文。第二个分类器使用相关推文并将其分类为负面、正面或中性。此顺序分类器用于标记第二批推文。合并后的数据集共有 3286 条带有负面情绪的推文:1770 条(53.9%)来自手动标记数据集,1516 条(46.1%)是自动分类的结果。主题建模方法(潜在狄利克雷分配[LDA]和非负矩阵分解[NMF])应用于预处理的 3286 条推文,以检测疫苗犹豫的原因。
相关性分类器对相关和不相关推文的 F 分数分别为 0.91 和 0.96。情感极性分类器对负面、中性和正面情绪的 F 分数分别为 0.87、0.85 和 0.85。通过总结两个模型中获得的主题,我们提取了疫苗犹豫的 5 个主要原因:对疫苗副作用的担忧、对疫苗有效性的担忧、对疫苗测试不足的担忧、对当局的不信任和阴谋论。
本文提出了一种结合 NLP 方法的组合,用于在塞尔维亚寻找疫苗犹豫的原因。有了这些原因,现在就可以更好地理解人们对疫苗接种过程的担忧。