Alsemaree Ohud, Alam Atm S, Gill Sukhpal Singh, Uhlig Steve
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 4NS, UK.
Heliyon. 2024 May 1;10(11):e30320. doi: 10.1016/j.heliyon.2024.e30320. eCollection 2024 Jun 15.
Sentiment Analysis (SA) employing Natural Language Processing (NLP) is pivotal in determining the positivity and negativity of customer feedback. Although significant research in SA is focused on English texts, there is a growing demand for SA in other widely spoken languages, such as Arabic. This is predominantly due to the global reach of social media which enables users to express opinions on products in any language and, in turn, necessitates a thorough understanding of customers' perceptions of new products based on social media conversations. However, the current research studies demonstrate inadequacies in furnishing text analysis for comprehending the perceptions of Arabic customers towards coffee and coffee products. Therefore, this study proposes a comprehensive Lexicon-based Sentiment Analysis on Arabic Texts (LSAnArTe) framework applied to social media data, to understand customer perceptions of coffee, a widely consumed product in the Arabic-speaking world. The LSAnArTe Framework incorporates the existing AraSenTi dictionary, an Arabic database of sentiment scores for Arabic words, and lemmatizes unknown words using the Qalasadi open platform. It classifies each word as positive, negative or neutral before conducting sentence-level sentiment classification. Data collected from X (formerly known as Twitter, resulted in a cleaned dataset of 10,769 tweets, is used to validate the proposed framework, which is then compared with Amazon Comprehend. The dataset was annotated manually to ensure maximum accuracy and reliability in validating the proposed LSAnArTe Framework. The results revealed that the proposed LSAnArTe Framework, with an accuracy score of 93.79 %, outperformed the Amazon Comprehend tool, which had an accuracy of 51.90 %.
运用自然语言处理(NLP)的情感分析(SA)对于确定客户反馈的积极性和消极性至关重要。尽管SA领域的大量研究集中在英文文本上,但对于阿拉伯语等其他广泛使用的语言,SA的需求也在不断增长。这主要是由于社交媒体的全球影响力,它使用户能够用任何语言表达对产品的看法,进而需要基于社交媒体对话全面了解客户对新产品的认知。然而,当前的研究在为理解阿拉伯客户对咖啡及咖啡产品的认知提供文本分析方面存在不足。因此,本研究提出了一个基于词典的阿拉伯语文本情感分析(LSAnArTe)综合框架,应用于社交媒体数据,以了解客户对咖啡(阿拉伯语地区广泛消费的产品)的认知。LSAnArTe框架整合了现有的AraSenTi词典(一个阿拉伯语单词情感评分的阿拉伯语数据库),并使用Qalasadi开放平台对未知单词进行词形还原。在进行句子级情感分类之前,它将每个单词分类为积极、消极或中性。从X(前身为Twitter)收集的数据(得到了一个包含10769条推文的清理后数据集)用于验证所提出的框架,然后将其与亚马逊理解工具进行比较。该数据集经过人工标注,以确保在验证所提出的LSAnArTe框架时具有最高的准确性和可靠性。结果显示,所提出的LSAnArTe框架准确率为93.79%,优于亚马逊理解工具,后者的准确率为51.90%。