Suppr超能文献

基于级联特征选择和异构分类器集成的语义关系机器学习情感分析模型。

Semantic relational machine learning model for sentiment analysis using cascade feature selection and heterogeneous classifier ensemble.

作者信息

Yenkikar Anuradha, Babu C Narendra, Hemanth D Jude

机构信息

Department of Computer Science and Engineering, M. S. Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India.

Department of Electronics and Communications Engineering, Karunya University, Coimbatore, Tamil Nadu, India.

出版信息

PeerJ Comput Sci. 2022 Sep 20;8:e1100. doi: 10.7717/peerj-cs.1100. eCollection 2022.

Abstract

The exponential rise in social media microblogging sites like Twitter has sparked curiosity in sentiment analysis that exploits user feedback towards a targeted product or service. Considering its significance in business intelligence and decision-making, numerous efforts have been made in this area. However, lack of dictionaries, unannotated data, large-scale unstructured data, and low accuracies have plagued these approaches. Also, sentiment classification through classifier ensemble has been underexplored in literature. In this article, we propose a Semantic Relational Machine Learning (SRML) model that automatically classifies the sentiment of tweets by using classifier ensemble and optimal features. The model employs the Cascaded Feature Selection (CFS) strategy, a novel statistical assessment approach based on Wilcoxon rank sum test, univariate logistic regression assisted significant predictor test and cross-correlation test. It further uses the efficacy of word2vec-based continuous bag-of-words and n-gram feature extraction in conjunction with SentiWordNet for finding optimal features for classification. We experiment on six public Twitter sentiment datasets, the STS-Gold dataset, the Obama-McCain Debate (OMD) dataset, the healthcare reform (HCR) dataset and the SemEval2017 Task 4A, 4B and 4C on a heterogeneous classifier ensemble comprising fourteen individual classifiers from different paradigms. Results from the experimental study indicate that CFS supports in attaining a higher classification accuracy with up to 50% lesser features compared to count vectorizer approach. In Intra-model performance assessment, the Artificial Neural Network-Gradient Descent (ANN-GD) classifier performs comparatively better than other individual classifiers, but the Best Trained Ensemble (BTE) strategy outperforms on all metrics. In inter-model performance assessment with existing state-of-the-art systems, the proposed model achieved higher accuracy and outperforms more accomplished models employing quantum-inspired sentiment representation (QSR), transformer-based methods like BERT, BERTweet, RoBERTa and ensemble techniques. The research thus provides critical insights into implementing similar strategy into building more generic and robust expert system for sentiment analysis that can be leveraged across industries.

摘要

社交媒体(如Twitter这样的微博网站)呈指数级增长,这引发了人们对情感分析的好奇,情感分析旨在利用用户对目标产品或服务的反馈。鉴于其在商业智能和决策中的重要性,该领域已经开展了大量工作。然而,缺乏词典、未标注数据、大规模非结构化数据以及低准确率等问题一直困扰着这些方法。此外,文献中对通过分类器集成进行情感分类的研究还不够充分。在本文中,我们提出了一种语义关系机器学习(SRML)模型,该模型通过使用分类器集成和最优特征来自动对推文的情感进行分类。该模型采用级联特征选择(CFS)策略,这是一种基于威尔科克森秩和检验、单变量逻辑回归辅助显著预测变量检验和互相关检验的新型统计评估方法。它还结合基于词向量的连续词袋模型和n元语法特征提取的功效以及情感词网(SentiWordNet)来寻找用于分类的最优特征。我们在六个公共Twitter情感数据集、STS - 黄金数据集、奥巴马 - 麦凯恩辩论(OMD)数据集、医疗改革(HCR)数据集以及SemEval2017任务4A、4B和4C上进行实验,所使用的异构分类器集成包含来自不同范式的14个个体分类器。实验研究结果表明,与计数向量器方法相比,CFS有助于以少多达50%的特征实现更高的分类准确率。在模型内性能评估中,人工神经网络 - 梯度下降(ANN - GD)分类器的表现相对优于其他个体分类器,但最佳训练集成(BTE)策略在所有指标上都更胜一筹。在与现有最先进系统的模型间性能评估中,所提出的模型实现了更高的准确率,并且优于采用量子启发情感表示(QSR)、基于Transformer的方法(如BERT、BERTweet、RoBERTa)和集成技术的更成熟模型。因此,该研究为将类似策略应用于构建更通用、更强大的情感分析专家系统提供了关键见解,这种系统可在各个行业中加以利用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc3/9575864/b7c2fc403634/peerj-cs-08-1100-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验