Kummervold Per E, Martin Sam, Dada Sara, Kilich Eliz, Denny Chermain, Paterson Pauline, Larson Heidi J
Vaccine Research Department, FISABIO-Public Health, Valencia, Spain.
Centre for Clinical Vaccinology and Tropical Medicine, University of Oxford, Oxford, United Kingdom.
JMIR Med Inform. 2021 Oct 8;9(10):e29584. doi: 10.2196/29584.
Social media has become an established platform for individuals to discuss and debate various subjects, including vaccination. With growing conversations on the web and less than desired maternal vaccination uptake rates, these conversations could provide useful insights to inform future interventions. However, owing to the volume of web-based posts, manual annotation and analysis are difficult and time consuming. Automated processes for this type of analysis, such as natural language processing, have faced challenges in extracting complex stances such as attitudes toward vaccination from large amounts of text.
The aim of this study is to build upon recent advances in transposer-based machine learning methods and test whether transformer-based machine learning could be used as a tool to assess the stance expressed in social media posts toward vaccination during pregnancy.
A total of 16,604 tweets posted between November 1, 2018, and April 30, 2019, were selected using keyword searches related to maternal vaccination. After excluding irrelevant tweets, the remaining tweets were coded by 3 individual researchers into the categories Promotional, Discouraging, Ambiguous, and Neutral or No Stance. After creating a final data set of 2722 unique tweets, multiple machine learning techniques were trained on a part of this data set and then tested and compared with the human annotators.
We found the accuracy of the machine learning techniques to be 81.8% (F score=0.78) compared with the agreed score among the 3 annotators. For comparison, the accuracies of the individual annotators compared with the final score were 83.3%, 77.9%, and 77.5%.
This study demonstrates that we are able to achieve close to the same accuracy in categorizing tweets using our machine learning models as could be expected from a single human coder. The potential to use this automated process, which is reliable and accurate, could free valuable time and resources for conducting this analysis, in addition to informing potentially effective and necessary interventions.
社交媒体已成为个人讨论和辩论各种话题(包括疫苗接种)的既定平台。随着网络上相关讨论的增加以及孕产妇疫苗接种率未达预期,这些讨论可为未来干预措施提供有用见解。然而,由于基于网络的帖子数量众多,人工注释和分析既困难又耗时。用于此类分析的自动化流程,如自然语言处理,在从大量文本中提取复杂立场(如对疫苗接种的态度)方面面临挑战。
本研究旨在基于基于变换器的机器学习方法的最新进展,测试基于变换器的机器学习是否可作为一种工具来评估社交媒体帖子中表达的关于孕期疫苗接种的立场。
使用与孕产妇疫苗接种相关的关键词搜索,选取了2018年11月1日至2019年4月30日期间发布的总共16604条推文。排除不相关的推文后,其余推文由3名研究人员分别编码为宣传、劝阻、模糊以及中立或无立场类别。在创建了包含2722条独特推文的最终数据集后,多种机器学习技术在该数据集的一部分上进行训练,然后进行测试并与人工注释者进行比较。
我们发现机器学习技术的准确率为81.8%(F分数 = 0.78),而3名注释者之间的一致分数为83.3%、77.9%和77.5%相比之下单个注释者与最终分数相比的准确率分别为83.3%、77.9%和77.5%。
本研究表明我们使用机器学习模型对推文进行分类能够达到与单个人工编码员预期相近的准确率。使用这种可靠且准确的自动化流程,除了为潜在的有效和必要干预措施提供信息外还可为进行此类分析节省宝贵的时间和资源。