一种用于对新冠疫情推文进行情感分析并实现类别平衡的混合深度学习模型。

A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing.

作者信息

Talukder Md Alamin, Uddin Md Ashraf, Roy Suman, Ghose Partho, Sarker Smita, Khraisat Ansam, Kazi Mohsin, Rahman Md Momtazur, Hakimi Musawer

机构信息

Department of Computer Science and Engineering, International University of Business Agriculture and Technology, Dhaka, Bangladesh.

School of Information Technology, Crown Institute of Higher Education, Canberra, Australia.

出版信息

Sci Rep. 2025 Jul 30;15(1):27788. doi: 10.1038/s41598-025-97778-7.

DOI:10.1038/s41598-025-97778-7

PMID:40738947

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12311151/

Abstract

The widespread dissemination of misinformation and the diverse public sentiment observed during the COVID-19 pandemic highlight the necessity for accurate sentiment analysis of social media discourse. This study proposes a hybrid deep learning (DL) model that integrates Bidirectional Encoder Representations from Transformers (BERT) for contextual feature extraction with Long Short-Term Memory (LSTM) networks for sequential learning to classify COVID-19-related sentiments. To enhance data quality, advanced text preprocessing techniques, including Unicode normalization, contraction expansion, and emoji conversion, are applied. Additionally, to mitigate class imbalance, Random OverSampling (ROS) is employed, leading to significant improvements in model performance. Before applying ROS, the model exhibited lower accuracy and inconsistent performance across sentiment categories. After balancing the dataset, accuracy for binary classification increased to 92.10%, with corresponding precision, sensitivity, and specificity of 92.10%, 92.10%, and 91.50%, respectively. For three-class sentiment classification, accuracy improved to 89.47%, with precision, sensitivity, and specificity of 89.80%, 89.47%, and 94.10%, respectively. In five-class sentiment classification, accuracy reached 81.78%, with precision, sensitivity, and specificity of 82.19%, 81.78%, and 95.28%, respectively. These findings demonstrate the efficacy of combining deep learning-based sentiment analysis with advanced text preprocessing and class balancing techniques for accurately classifying public sentiment related to COVID-19 across multiple sentiment categories.

摘要

在新冠疫情期间，错误信息的广泛传播以及观察到的公众情绪多样性凸显了对社交媒体话语进行准确情绪分析的必要性。本研究提出了一种混合深度学习（DL）模型，该模型将用于上下文特征提取的基于变换器的双向编码器表征（BERT）与用于序列学习的长短期记忆（LSTM）网络相结合，以对与新冠疫情相关的情绪进行分类。为了提高数据质量，应用了包括统一码规范化、缩合扩展和表情符号转换在内的先进文本预处理技术。此外，为了缓解类别不平衡问题，采用了随机过采样（ROS），从而显著提高了模型性能。在应用ROS之前，模型在不同情绪类别上表现出较低的准确率和不一致的性能。在平衡数据集后，二元分类的准确率提高到92.10%，相应的精确率、敏感度和特异度分别为92.10%、92.10%和91.50%。对于三类情绪分类，准确率提高到89.47%，精确率、敏感度和特异度分别为89.80%、89.47%和94.10%。在五类情绪分类中，准确率达到81.78%，精确率、敏感度和特异度分别为82.19%、81.78%和95.28%。这些发现证明了将基于深度学习的情绪分析与先进的文本预处理和类别平衡技术相结合，对于跨多个情绪类别准确分类与新冠疫情相关的公众情绪的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于对新冠疫情推文进行情感分析并实现类别平衡的混合深度学习模型。

A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

一种用于对新冠疫情推文进行情感分析并实现类别平衡的混合深度学习模型。

A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing.

作者信息

机构信息

出版信息

相似文献

本文引用的文献