• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蒸馏循环神经网络长短期记忆网络融合模型:一种用于情感分析的高效混合深度学习方法。

DistilRoBiLSTMFuse: an efficient hybrid deep learning approach for sentiment analysis.

作者信息

Papia Sonia Khan, Khan Md Asif, Habib Tanvir, Rahman Mizanur, Islam Md Nahidul

机构信息

Information Technology, Washington University of Science & Technology, Alexandria, VA, United States of America.

International Relations, University of Dhaka, Dhaka, Bangladesh.

出版信息

PeerJ Comput Sci. 2024 Sep 26;10:e2349. doi: 10.7717/peerj-cs.2349. eCollection 2024.

DOI:10.7717/peerj-cs.2349
PMID:39650469
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11623128/
Abstract

In today's modern society, social media has seamlessly integrated into our daily routines, providing a platform for individuals to express their opinions and emotions openly on the internet. Within this digital domain, sentiment analysis (SA) is a vital tool to understand the emotions conveyed in written text, whether positive, negative, or neutral. However, SA faces challenges such as dealing with diverse language, uneven data, and understanding complex sentences. This study proposes an effective approach for SA. For this, we introduce a hybrid architecture named DistilRoBiLSTMFuse, designed to extract deep contextual information from complex sentences and accurately identify sentiments. In this research, we evaluate our model's performance using two popular benchmark datasets: IMDb and Twitter USAirline sentiment. The raw text data are preprocessed, and this involves several steps, including: (1) implementing a comprehensive data cleaning protocol to remove noise and unnecessary information from the raw text, (2) preparing a custom list of stopwords to retain essential words while omitting common, non-informative words, and (3) applying Lemmatization to achieve consistency in text by reducing words to their base forms, enhancing the accuracy of text analysis. To address class imbalance, this study utilized oversampling, augmenting minority class samples to match the majority, thereby ensuring uniform representation across all categories. Considering the variability in preprocessing techniques across previous studies, our research initially explores the efficacy of seven distinct machine learning (ML) models paired with two commonly employed feature transformation methods: term frequency-inverse document frequency (TF-IDF) and bag of words (BoW). This approach allows for determining which combination yields optimal performance within these ML frameworks. In our study, the DistilRoBiLSTMFuse model is evaluated on two distinct datasets and consistently delivers outstanding performance, surpassing existing state-of-the-art approaches in each case. On the IMDb dataset, our model achieves 98.91% accuracy in training, 94.16% in validation, and 93.97% in testing. The Twitter USAirline Sentiment dataset reaches 99.42% accuracy in training, 98.52% in validation, and 98.33% in testing. The experimental results clearly demonstrate the effectiveness of our hybrid DistilRoBiLSTMFuse model in SA tasks. The code for this experimental analysis is publicly available and can be accessed the following DOI: https://doi.org/10.5281/zenodo.13255008.

摘要

在当今现代社会,社交媒体已无缝融入我们的日常生活,为个人提供了一个在互联网上公开表达意见和情感的平台。在这个数字领域中,情感分析(SA)是理解书面文本中所传达情感(无论是积极、消极还是中性)的重要工具。然而,SA面临着诸如处理多种语言、数据不均衡以及理解复杂句子等挑战。本研究提出了一种有效的SA方法。为此,我们引入了一种名为DistilRoBiLSTMFuse的混合架构,旨在从复杂句子中提取深度上下文信息并准确识别情感。在本研究中,我们使用两个流行的基准数据集IMDb和Twitter美国航空情感数据集来评估我们模型的性能。原始文本数据经过预处理,这涉及几个步骤,包括:(1)实施全面的数据清理协议以从原始文本中去除噪声和不必要的信息;(2)准备一个自定义停用词列表,以保留关键单词同时省略常见的、无信息价值的单词;(3)应用词形还原,通过将单词还原为其基本形式来实现文本的一致性,提高文本分析的准确性。为了解决类别不平衡问题,本研究采用了过采样方法,增加少数类样本以使其与多数类匹配,从而确保所有类别都有统一的表示。考虑到以往研究中预处理技术的差异,我们的研究首先探索了七种不同的机器学习(ML)模型与两种常用特征转换方法(词频 - 逆文档频率(TF - IDF)和词袋模型(BoW))配对的效果。这种方法可以确定在这些ML框架中哪种组合能产生最佳性能。在我们的研究中,DistilRoBiLSTMFuse模型在两个不同的数据集上进行了评估,并始终表现出色,在每种情况下都超越了现有的最先进方法。在IMDb数据集上,我们的模型在训练时准确率达到98.91%,验证时为94.16%,测试时为93.97%。Twitter美国航空情感数据集在训练时准确率达到99.42%,验证时为98.52%,测试时为98.33%。实验结果清楚地证明了我们的混合DistilRoBiLSTMFuse模型在SA任务中的有效性。本次实验分析的代码可公开获取,可通过以下DOI访问:https://doi.org/10.5281/zenodo.13255008。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/aa4909ee2074/peerj-cs-10-2349-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/e021fedb7124/peerj-cs-10-2349-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/c7ceeef78181/peerj-cs-10-2349-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/37c3aa78ece1/peerj-cs-10-2349-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/ffb4f8313e18/peerj-cs-10-2349-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/23fcb20fd7be/peerj-cs-10-2349-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/1bb4605bdefa/peerj-cs-10-2349-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/3c5bc19ffe15/peerj-cs-10-2349-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/96e68c86cfe6/peerj-cs-10-2349-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/aa4909ee2074/peerj-cs-10-2349-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/e021fedb7124/peerj-cs-10-2349-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/c7ceeef78181/peerj-cs-10-2349-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/37c3aa78ece1/peerj-cs-10-2349-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/ffb4f8313e18/peerj-cs-10-2349-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/23fcb20fd7be/peerj-cs-10-2349-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/1bb4605bdefa/peerj-cs-10-2349-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/3c5bc19ffe15/peerj-cs-10-2349-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/96e68c86cfe6/peerj-cs-10-2349-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9427/11623128/aa4909ee2074/peerj-cs-10-2349-g009.jpg

相似文献

1
DistilRoBiLSTMFuse: an efficient hybrid deep learning approach for sentiment analysis.蒸馏循环神经网络长短期记忆网络融合模型:一种用于情感分析的高效混合深度学习方法。
PeerJ Comput Sci. 2024 Sep 26;10:e2349. doi: 10.7717/peerj-cs.2349. eCollection 2024.
2
A hybrid transformer and attention based recurrent neural network for robust and interpretable sentiment analysis of tweets.基于混合变压器和注意力的循环神经网络的 tweet 情感分析的鲁棒性和可解释性。
Sci Rep. 2024 Oct 22;14(1):24882. doi: 10.1038/s41598-024-76079-5.
3
Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms.使用词频-逆文档频率和优化的机器学习算法对电影评论进行分类。
PeerJ Comput Sci. 2022 Mar 15;8:e914. doi: 10.7717/peerj-cs.914. eCollection 2022.
4
Enhancing machine learning-based sentiment analysis through feature extraction techniques.通过特征提取技术增强基于机器学习的情感分析。
PLoS One. 2024 Feb 14;19(2):e0294968. doi: 10.1371/journal.pone.0294968. eCollection 2024.
5
ArabBert-LSTM: improving Arabic sentiment analysis based on transformer model and Long Short-Term Memory.阿拉伯语BERT-LSTM:基于Transformer模型和长短期记忆改进阿拉伯语情感分析
Front Artif Intell. 2024 Jul 2;7:1408845. doi: 10.3389/frai.2024.1408845. eCollection 2024.
6
Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.使用BERT + NBSVM和地理空间方法的疫苗情绪分析。
J Supercomput. 2023 May 7:1-31. doi: 10.1007/s11227-023-05319-8.
7
Multi-level aspect based sentiment classification of Twitter data: using hybrid approach in deep learning.基于多层次方面的Twitter数据情感分类:深度学习中的混合方法
PeerJ Comput Sci. 2021 Apr 13;7:e433. doi: 10.7717/peerj-cs.433. eCollection 2021.
8
Distilroberta2gnn: a new hybrid deep learning approach for aspect-based sentiment analysis.Distilroberta2gnn:一种用于基于方面的情感分析的新型混合深度学习方法。
PeerJ Comput Sci. 2024 Aug 16;10:e2267. doi: 10.7717/peerj-cs.2267. eCollection 2024.
9
Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia-Ukraine War.使用关于俄乌战争的推特数据对监督式机器学习模型进行情感分析与综合评估
SN Comput Sci. 2023;4(4):346. doi: 10.1007/s42979-023-01790-5. Epub 2023 Apr 21.
10
A hybrid dependency-based approach for Urdu sentiment analysis.一种基于混合依存关系的乌尔都语情感分析方法。
Sci Rep. 2023 Dec 12;13(1):22075. doi: 10.1038/s41598-023-48817-8.

本文引用的文献

1
A sentiment analysis approach for travel-related Chinese online review content.一种针对与旅行相关的中文在线评论内容的情感分析方法。
PeerJ Comput Sci. 2023 Aug 23;9:e1538. doi: 10.7717/peerj-cs.1538. eCollection 2023.
2
On the frontiers of Twitter data and sentiment analysis in election prediction: a review.推特数据与情感分析在选举预测中的前沿应用:综述
PeerJ Comput Sci. 2023 Aug 21;9:e1517. doi: 10.7717/peerj-cs.1517. eCollection 2023.
3
Adaptive sentiment analysis using multioutput classification: a performance comparison.使用多输出分类的自适应情感分析:性能比较
PeerJ Comput Sci. 2023 May 9;9:e1378. doi: 10.7717/peerj-cs.1378. eCollection 2023.
4
A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets.一种基于机器学习的方法用于对阿拉伯语推文的远程学习进行情感分析。
PeerJ Comput Sci. 2022 Jul 26;8:e1047. doi: 10.7717/peerj-cs.1047. eCollection 2022.
5
Context-based sentiment analysis on customer reviews using machine learning linear models.使用机器学习线性模型对客户评论进行基于上下文的情感分析。
PeerJ Comput Sci. 2021 Dec 17;7:e813. doi: 10.7717/peerj-cs.813. eCollection 2021.
6
Confirm or refute?: A comparative study on citation sentiment classification in clinical research publications.确认或反驳?:临床研究出版物中引文情绪分类的对比研究。
J Biomed Inform. 2019 Mar;91:103123. doi: 10.1016/j.jbi.2019.103123. Epub 2019 Feb 10.