• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于统计的亚马逊客户评论异常值检测与校正方法

Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews.

作者信息

Chatterjee Ishani, Zhou Mengchu, Abusorrah Abdullah, Sedraoui Khaled, Alabdulwahab Ahmed

机构信息

Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA.

Department of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi Arabia.

出版信息

Entropy (Basel). 2021 Dec 7;23(12):1645. doi: 10.3390/e23121645.

DOI:10.3390/e23121645
PMID:34945950
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8700267/
Abstract

People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.

摘要

如今,人们利用互联网在众多社交网站上展示他们对各种主题或产品的评价、印象、想法和观察结果。这些网站是收集数据分析、情感分析、自然语言处理等数据的重要来源。传统上,客户评论的真实情感与相应的星级评级相符。但也有例外情况,即评论的星级评级与其真实性质相反。在本研究中,这些被标记为数据集中的异常值。目前先进的异常检测方法包括人工搜索、预定义规则或传统机器学习技术来检测此类情况。本文针对亚马逊客户评论进行了情感分析和异常值检测案例研究,并提出了一种基于统计的异常值检测与修正方法(SODCM),该方法有助于识别此类评论并修正其星级评级,以提高情感分析算法的性能且不造成任何数据损失。本文重点在包含各种产品客户评论的数据集上执行SODCM,这些数据集(a)是从Amazon.com上抓取的,(b)是公开可用的。本文还研究了该数据集,并总结了SODCM对情感分析算法性能的影响。结果表明,SODCM比其他先进的异常检测算法具有更高的准确率和召回率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/1e30c12c7e9c/entropy-23-01645-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/02cd77c9b0ce/entropy-23-01645-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/23fe1439b6ac/entropy-23-01645-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/9018810f8126/entropy-23-01645-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/fc67a5f6e3b5/entropy-23-01645-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/1e30c12c7e9c/entropy-23-01645-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/02cd77c9b0ce/entropy-23-01645-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/23fe1439b6ac/entropy-23-01645-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/9018810f8126/entropy-23-01645-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/fc67a5f6e3b5/entropy-23-01645-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384d/8700267/1e30c12c7e9c/entropy-23-01645-g005.jpg

相似文献

1
Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews.基于统计的亚马逊客户评论异常值检测与校正方法
Entropy (Basel). 2021 Dec 7;23(12):1645. doi: 10.3390/e23121645.
2
A data package for abstractive opinion summarization, title generation, and rating-based sentiment prediction for airline reviews.一个用于航空公司评论的抽象意见总结、标题生成和基于评分的情感预测的数据包。
Data Brief. 2023 Sep 1;50:109535. doi: 10.1016/j.dib.2023.109535. eCollection 2023 Oct.
3
STAR_outliers: a python package that separates univariate outliers from non-normal distributions.STAR异常值:一个用于从非正态分布中分离单变量异常值的Python包。
BioData Min. 2023 Sep 4;16(1):25. doi: 10.1186/s13040-023-00342-0.
4
Context-based sentiment analysis on customer reviews using machine learning linear models.使用机器学习线性模型对客户评论进行基于上下文的情感分析。
PeerJ Comput Sci. 2021 Dec 17;7:e813. doi: 10.7717/peerj-cs.813. eCollection 2021.
5
An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets.基于用户评论的大规模数据集协同过滤性能的实验研究。
PeerJ Comput Sci. 2023 Aug 25;9:e1525. doi: 10.7717/peerj-cs.1525. eCollection 2023.
6
Natural language processing for analyzing online customer reviews: a survey, taxonomy, and open research challenges.用于分析在线客户评论的自然语言处理:一项综述、分类法及开放研究挑战
PeerJ Comput Sci. 2024 Jul 19;10:e2203. doi: 10.7717/peerj-cs.2203. eCollection 2024.
7
Multinomial Naive Bayesian Classifier Framework for Systematic Analysis of Smart IoT Devices.用于智能物联网设备系统分析的多项式朴素贝叶斯分类器框架
Sensors (Basel). 2022 Sep 27;22(19):7318. doi: 10.3390/s22197318.
8
Towards improving e-commerce customer review analysis for sentiment detection.面向提升电子商务客户评论分析以进行情感检测。
Sci Rep. 2022 Dec 20;12(1):21983. doi: 10.1038/s41598-022-26432-3.
9
Sentiment Analysis of Online Patient-Written Reviews of Vascular Surgeons.血管外科医生在线患者评论的情感分析。
Ann Vasc Surg. 2023 Jan;88:249-255. doi: 10.1016/j.avsg.2022.07.016. Epub 2022 Aug 23.
10
Sentiment analysis of clinical narratives: A scoping review.临床叙事的情感分析:一项范围综述。
J Biomed Inform. 2023 Apr;140:104336. doi: 10.1016/j.jbi.2023.104336. Epub 2023 Mar 22.

引用本文的文献

1
Natural language processing for analyzing online customer reviews: a survey, taxonomy, and open research challenges.用于分析在线客户评论的自然语言处理:一项综述、分类法及开放研究挑战
PeerJ Comput Sci. 2024 Jul 19;10:e2203. doi: 10.7717/peerj-cs.2203. eCollection 2024.
2
Emotional Variance Analysis: A new sentiment analysis feature set for Artificial Intelligence and Machine Learning applications.情感波动分析:人工智能和机器学习应用的新情感分析特征集。
PLoS One. 2023 Jan 12;18(1):e0274299. doi: 10.1371/journal.pone.0274299. eCollection 2023.

本文引用的文献

1
Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data.使用 Twitter 数据对加拿大 COVID-19 相关社交隔离的情感分析。
Int J Environ Res Public Health. 2021 Jun 3;18(11):5993. doi: 10.3390/ijerph18115993.
2
A Compression-Based Method for Detecting Anomalies in Textual Data.一种基于压缩的文本数据异常检测方法。
Entropy (Basel). 2021 May 16;23(5):618. doi: 10.3390/e23050618.
3
Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus.用于确定不平衡多类语料库中情感极性的集成学习进化优化
Entropy (Basel). 2020 Sep 12;22(9):1020. doi: 10.3390/e22091020.