Suppr超能文献

蒸馏循环神经网络长短期记忆网络融合模型:一种用于情感分析的高效混合深度学习方法。

DistilRoBiLSTMFuse: an efficient hybrid deep learning approach for sentiment analysis.

作者信息

Papia Sonia Khan, Khan Md Asif, Habib Tanvir, Rahman Mizanur, Islam Md Nahidul

机构信息

Information Technology, Washington University of Science & Technology, Alexandria, VA, United States of America.

International Relations, University of Dhaka, Dhaka, Bangladesh.

出版信息

PeerJ Comput Sci. 2024 Sep 26;10:e2349. doi: 10.7717/peerj-cs.2349. eCollection 2024.

Abstract

In today's modern society, social media has seamlessly integrated into our daily routines, providing a platform for individuals to express their opinions and emotions openly on the internet. Within this digital domain, sentiment analysis (SA) is a vital tool to understand the emotions conveyed in written text, whether positive, negative, or neutral. However, SA faces challenges such as dealing with diverse language, uneven data, and understanding complex sentences. This study proposes an effective approach for SA. For this, we introduce a hybrid architecture named DistilRoBiLSTMFuse, designed to extract deep contextual information from complex sentences and accurately identify sentiments. In this research, we evaluate our model's performance using two popular benchmark datasets: IMDb and Twitter USAirline sentiment. The raw text data are preprocessed, and this involves several steps, including: (1) implementing a comprehensive data cleaning protocol to remove noise and unnecessary information from the raw text, (2) preparing a custom list of stopwords to retain essential words while omitting common, non-informative words, and (3) applying Lemmatization to achieve consistency in text by reducing words to their base forms, enhancing the accuracy of text analysis. To address class imbalance, this study utilized oversampling, augmenting minority class samples to match the majority, thereby ensuring uniform representation across all categories. Considering the variability in preprocessing techniques across previous studies, our research initially explores the efficacy of seven distinct machine learning (ML) models paired with two commonly employed feature transformation methods: term frequency-inverse document frequency (TF-IDF) and bag of words (BoW). This approach allows for determining which combination yields optimal performance within these ML frameworks. In our study, the DistilRoBiLSTMFuse model is evaluated on two distinct datasets and consistently delivers outstanding performance, surpassing existing state-of-the-art approaches in each case. On the IMDb dataset, our model achieves 98.91% accuracy in training, 94.16% in validation, and 93.97% in testing. The Twitter USAirline Sentiment dataset reaches 99.42% accuracy in training, 98.52% in validation, and 98.33% in testing. The experimental results clearly demonstrate the effectiveness of our hybrid DistilRoBiLSTMFuse model in SA tasks. The code for this experimental analysis is publicly available and can be accessed the following DOI: https://doi.org/10.5281/zenodo.13255008.

摘要

在当今现代社会,社交媒体已无缝融入我们的日常生活,为个人提供了一个在互联网上公开表达意见和情感的平台。在这个数字领域中,情感分析(SA)是理解书面文本中所传达情感(无论是积极、消极还是中性)的重要工具。然而,SA面临着诸如处理多种语言、数据不均衡以及理解复杂句子等挑战。本研究提出了一种有效的SA方法。为此,我们引入了一种名为DistilRoBiLSTMFuse的混合架构,旨在从复杂句子中提取深度上下文信息并准确识别情感。在本研究中,我们使用两个流行的基准数据集IMDb和Twitter美国航空情感数据集来评估我们模型的性能。原始文本数据经过预处理,这涉及几个步骤,包括:(1)实施全面的数据清理协议以从原始文本中去除噪声和不必要的信息;(2)准备一个自定义停用词列表,以保留关键单词同时省略常见的、无信息价值的单词;(3)应用词形还原,通过将单词还原为其基本形式来实现文本的一致性,提高文本分析的准确性。为了解决类别不平衡问题,本研究采用了过采样方法,增加少数类样本以使其与多数类匹配,从而确保所有类别都有统一的表示。考虑到以往研究中预处理技术的差异,我们的研究首先探索了七种不同的机器学习(ML)模型与两种常用特征转换方法(词频 - 逆文档频率(TF - IDF)和词袋模型(BoW))配对的效果。这种方法可以确定在这些ML框架中哪种组合能产生最佳性能。在我们的研究中,DistilRoBiLSTMFuse模型在两个不同的数据集上进行了评估,并始终表现出色,在每种情况下都超越了现有的最先进方法。在IMDb数据集上,我们的模型在训练时准确率达到98.91%,验证时为94.16%,测试时为93.97%。Twitter美国航空情感数据集在训练时准确率达到99.42%,验证时为98.52%,测试时为98.33%。实验结果清楚地证明了我们的混合DistilRoBiLSTMFuse模型在SA任务中的有效性。本次实验分析的代码可公开获取,可通过以下DOI访问:https://doi.org/10.5281/zenodo.13255008。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/e021fedb7124/peerj-cs-10-2349-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验