Perikli Nicholas, Bhattacharya Srimoy, Ogbuokiri Blessing, Movahedi Nia Zahra, Lieberman Benjamin, Tripathi Nidhi, Dahbi Salah-Eddine, Stevenson Finn, Bragazzi Nicola, Kong Jude, Mellado Bruce
School of Physics and Institute for Collider Particle Physics, University of the Witwatersrand, Johannesburg, South Africa.
iThemba LABS, National Research Foundation, Cape Town, South Africa.
PLOS Digit Health. 2024 Jul 30;3(7):e0000545. doi: 10.1371/journal.pdig.0000545. eCollection 2024 Jul.
Manually labeling data for supervised learning is time and energy consuming; therefore, lexicon-based models such as VADER and TextBlob are used to automatically label data. However, it is argued that automated labels do not have the accuracy required for training an efficient model. Although automated labeling is frequently used for stance detection, automated stance labels have not been properly evaluated, in the previous works. In this work, to assess the accuracy of VADER and TextBlob automated labels for stance analysis, we first manually label a Twitter, now X, dataset related to M-pox stance detection. We then fine-tune different transformer-based models on the hand-labeled M-pox dataset, and compare their accuracy before and after fine-tuning, with the accuracy of automated labeled data. Our results indicated that the fine-tuned models surpassed the accuracy of VADER and TextBlob automated labels by up to 38% and 72.5%, respectively. Topic modeling further shows that fine-tuning diminished the scope of misclassified tweets to specific sub-topics. We conclude that fine-tuning transformer models on hand-labeled data for stance detection, elevates the accuracy to a superior level that is significantly higher than automated stance detection labels. This study verifies that automated stance detection labels are not reliable for sensitive use-cases such as health-related purposes. Manually labeled data is more convenient for developing Natural Language Processing (NLP) models that study and analyze mass opinions and conversations on social media platforms, during crises such as pandemics and epidemics.
为监督学习手动标注数据既耗时又耗力;因此,诸如VADER和TextBlob等基于词汇的模型被用于自动标注数据。然而,有人认为自动标注的标签不具备训练高效模型所需的准确性。尽管自动标注经常用于立场检测,但在以往的研究中,自动立场标签尚未得到恰当评估。在本研究中,为评估VADER和TextBlob自动标签用于立场分析的准确性,我们首先手动标注了一个与猴痘立场检测相关的推特(现称X)数据集。然后,我们在人工标注的猴痘数据集上对不同的基于Transformer的模型进行微调,并将微调前后它们的准确性与自动标注数据的准确性进行比较。我们的结果表明,微调后的模型分别比VADER和TextBlob自动标签的准确性高出38%和72.5%。主题建模进一步表明,微调将误分类推文的范围缩小到特定子主题。我们得出结论,在人工标注的数据上对Transformer模型进行微调以进行立场检测,可将准确性提升到一个高于自动立场检测标签的卓越水平。本研究证实,对于诸如与健康相关目的等敏感用例,自动立场检测标签并不可靠。在大流行和疫情等危机期间,人工标注的数据对于开发研究和分析社交媒体平台上大量观点和对话的自然语言处理(NLP)模型更为便利。