• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于确定不平衡多类语料库中情感极性的集成学习进化优化

Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus.

作者信息

García-Mendoza Consuelo V, Gambino Omar J, Villarreal-Cervantes Miguel G, Calvo Hiram

机构信息

Escuela Superior de Cómputo, Instituto Politécnico Nacional, Mexico City 07738, Mexico.

Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico City 07700, Mexico.

出版信息

Entropy (Basel). 2020 Sep 12;22(9):1020. doi: 10.3390/e22091020.

DOI:10.3390/e22091020
PMID:33286789
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7597113/
Abstract

Sentiment polarity classification in social media is a very important task, as it enables gathering trends on particular subjects given a set of opinions. Currently, a great advance has been made by using deep learning techniques, such as word embeddings, recurrent neural networks, and encoders, such as BERT. Unfortunately, these techniques require large amounts of data, which, in some cases, is not available. In order to model this situation, challenges, such as the Spanish TASS organized by the Spanish Society for Natural Language Processing (SEPLN), have been proposed, which pose particular difficulties: First, an unwieldy balance in the training and the test set, being this latter more than eight times the size of the training set. Another difficulty is the marked unbalance in the distribution of classes, which is also different between both sets. Finally, there are four different labels, which create the need to adapt current classifications methods for multiclass handling. Traditional machine learning methods, such as Naïve Bayes, Logistic Regression, and Support Vector Machines, achieve modest performance in these conditions, but used as an ensemble it is possible to attain competitive execution. Several strategies to build classifier ensembles have been proposed; this paper proposes estimating an optimal weighting scheme using a Differential Evolution algorithm focused on dealing with particular issues that multiclass classification and unbalanced corpora pose. The ensemble with the proposed optimized weighting scheme is able to improve the classification results on the full test set of the TASS challenge (General corpus), achieving state of the art performance when compared with other works on this task, which make no use of NLP techniques.

摘要

社交媒体中的情感极性分类是一项非常重要的任务,因为它能够根据一系列观点收集特定主题的趋势。目前,通过使用深度学习技术,如词嵌入、循环神经网络以及像BERT这样的编码器,已经取得了巨大进展。不幸的是,这些技术需要大量数据,而在某些情况下,这些数据是无法获得的。为了模拟这种情况,已经提出了一些挑战,比如由西班牙自然语言处理协会(SEPLN)组织的西班牙TASS,它带来了一些特殊困难:首先,训练集和测试集的平衡难以处理,测试集的规模超过训练集的八倍。另一个困难是类分布的明显不平衡,并且两个集合之间也有所不同。最后,有四个不同的标签,这就需要调整当前的多类处理分类方法。传统的机器学习方法,如朴素贝叶斯、逻辑回归和支持向量机,在这些条件下表现一般,但作为一个集成模型使用时,有可能获得有竞争力的执行效果。已经提出了几种构建分类器集成的策略;本文提出使用差分进化算法估计一种最优加权方案,该算法专注于处理多类分类和不平衡语料库带来的特定问题。具有所提出的优化加权方案的集成模型能够在TASS挑战(通用语料库)的完整测试集上提高分类结果,与该任务中其他未使用自然语言处理技术的作品相比,达到了当前的最优性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/7351f3923b0c/entropy-22-01020-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/9746e6da26cf/entropy-22-01020-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/1c1c77e27d3d/entropy-22-01020-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/7351f3923b0c/entropy-22-01020-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/9746e6da26cf/entropy-22-01020-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/1c1c77e27d3d/entropy-22-01020-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ce0/7597113/7351f3923b0c/entropy-22-01020-g003.jpg

相似文献

1
Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus.用于确定不平衡多类语料库中情感极性的集成学习进化优化
Entropy (Basel). 2020 Sep 12;22(9):1020. doi: 10.3390/e22091020.
2
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.当“负面”即“正面”:识别与毒品相关推文中的个人交流和情感倾向
JMIR Public Health Surveill. 2016 Oct 24;2(2):e162. doi: 10.2196/publichealth.6327.
3
Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis.用于增强阿拉伯语情感分析的异质集成深度学习模型。
Sensors (Basel). 2022 May 12;22(10):3707. doi: 10.3390/s22103707.
4
Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble.基于级联特征选择和异构分类器集成的语义关系机器学习情感分析模型。
PeerJ Comput Sci. 2022 Sep 20;8:e1100. doi: 10.7717/peerj-cs.1100. eCollection 2022.
5
A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification.一种集成医学知识的新词嵌入模型,用于基于深度学习的情感分类。
Artif Intell Med. 2024 Feb;148:102758. doi: 10.1016/j.artmed.2023.102758. Epub 2024 Jan 8.
6
CDNB: CAVIAR-Dragonfly Optimization with Naive Bayes for the Sentiment and Affect Analysis in Social Media.CDNB:基于朴素贝叶斯的蜻蜓优化算法在社交媒体情感和情感分析中的应用。
Big Data. 2020 Apr;8(2):107-124. doi: 10.1089/big.2019.0130.
7
Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets.基于机器学习方法对HPV疫苗相关推文进行情感分析的优化。
J Biomed Semantics. 2017 Mar 3;8(1):9. doi: 10.1186/s13326-017-0120-6.
8
Improving Sentiment Analysis for Social Media Applications Using an Ensemble Deep Learning Language Model.使用集成深度学习语言模型改进社交媒体应用的情感分析
Arab J Sci Eng. 2022;47(2):2499-2511. doi: 10.1007/s13369-021-06227-w. Epub 2021 Oct 11.
9
Classifying adverse drug reactions from imbalanced twitter data.从不平衡的推特数据中分类药物不良反应。
Int J Med Inform. 2019 Sep;129:122-132. doi: 10.1016/j.ijmedinf.2019.05.017. Epub 2019 May 30.
10
A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM.一种结合Transformer和大语言模型集成的跨语言情感分析多模态方法。
Sci Rep. 2024 Apr 26;14(1):9603. doi: 10.1038/s41598-024-60210-7.

引用本文的文献

1
A Framework for Text Classification Using Evolutionary Contiguous Convolutional Neural Network and Swarm Based Deep Neural Network.一种使用进化连续卷积神经网络和基于群体的深度神经网络进行文本分类的框架。
Front Comput Neurosci. 2022 Jun 29;16:900885. doi: 10.3389/fncom.2022.900885. eCollection 2022.
2
Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews.基于统计的亚马逊客户评论异常值检测与校正方法
Entropy (Basel). 2021 Dec 7;23(12):1645. doi: 10.3390/e23121645.
3
Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy.

本文引用的文献

1
Gene selection for tumor classification using a novel bio-inspired multi-objective approach.基于新型生物启发式多目标方法的肿瘤分类基因选择。
Genomics. 2018 Jan;110(1):10-17. doi: 10.1016/j.ygeno.2017.07.010. Epub 2017 Aug 3.
2
One-hot vector hybrid associative classifier for medical data classification.用于医学数据分类的独热向量混合关联分类器。
PLoS One. 2014 Apr 21;9(4):e95715. doi: 10.1371/journal.pone.0095715. eCollection 2014.
具有动态选择策略的不平衡集成分类器的实验研究与比较
Entropy (Basel). 2021 Jun 28;23(7):822. doi: 10.3390/e23070822.