Elmitwalli Sherif, Mehegan John
Tobacco Control Research Group, Department for Health, University of Bath, Bath, United Kingdom.
Front Big Data. 2024 Mar 20;7:1357926. doi: 10.3389/fdata.2024.1357926. eCollection 2024.
Sentiment analysis has become a crucial area of research in natural language processing in recent years. The study aims to compare the performance of various sentiment analysis techniques, including lexicon-based, machine learning, Bi-LSTM, BERT, and GPT-3 approaches, using two commonly used datasets, IMDB reviews and Sentiment140. The objective is to identify the best-performing technique for an exemplar dataset, tweets associated with the WHO Framework Convention on Tobacco Control Ninth Conference of the Parties in 2021 (COP9).
A two-stage evaluation was conducted. In the first stage, various techniques were compared on standard sentiment analysis datasets using standard evaluation metrics such as accuracy, F1-score, and precision. In the second stage, the best-performing techniques from the first stage were applied to partially annotated COP9 conference-related tweets.
In the first stage, BERT achieved the highest F1-scores (0.9380 for IMDB and 0.8114 for Sentiment 140), followed by GPT-3 (0.9119 and 0.7913) and Bi-LSTM (0.8971 and 0.7778). In the second stage, GPT-3 performed the best for sentiment analysis on partially annotated COP9 conference-related tweets, with an F1-score of 0.8812.
The study demonstrates the effectiveness of pre-trained models like BERT and GPT-3 for sentiment analysis tasks, outperforming traditional techniques on standard datasets. Moreover, the better performance of GPT-3 on the partially annotated COP9 tweets highlights its ability to generalize well to domain-specific data with limited annotations. This provides researchers and practitioners with a viable option of using pre-trained models for sentiment analysis in scenarios with limited or no annotated data across different domains.
近年来,情感分析已成为自然语言处理中一个至关重要的研究领域。本研究旨在使用两个常用数据集(IMDB影评和Sentiment140)比较各种情感分析技术的性能,包括基于词典的、机器学习、双向长短期记忆网络(Bi-LSTM)、BERT和GPT-3方法。目标是为一个示例数据集(与2021年世界卫生组织《烟草控制框架公约》第九届缔约方会议(COP9)相关的推文)确定性能最佳的技术。
进行了两阶段评估。在第一阶段,使用诸如准确率、F1分数和精确率等标准评估指标,在标准情感分析数据集上比较各种技术。在第二阶段,将第一阶段中性能最佳的技术应用于部分标注的与COP9会议相关的推文。
在第一阶段,BERT获得了最高的F1分数(IMDB数据集为0.9380,Sentiment140数据集为0.8114),其次是GPT-3(分别为0.9119和0.7913)和Bi-LSTM(分别为0.8971和0.7778)。在第二阶段,GPT-3在部分标注的与COP9会议相关的推文的情感分析中表现最佳,F1分数为0.8812。
该研究证明了像BERT和GPT-3这样的预训练模型在情感分析任务中的有效性,在标准数据集上优于传统技术。此外,GPT-3在部分标注的COP9推文中的更好性能突出了其对有限标注的特定领域数据的良好泛化能力。这为研究人员和从业者提供了一个可行的选择,即在不同领域中有限或无标注数据的情况下,使用预训练模型进行情感分析。