Zhang Lili, Habibi Abbas
School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
University of Tehran, Tehran, Iran.
Sci Rep. 2025 Aug 27;15(1):31556. doi: 10.1038/s41598-025-17457-5.
Sentiment analysis using machine learning has become increasingly popular and has received considerable attention in recent years. The sentiment analysis is a critical and challenging task, which require networks with high accuracy. This study utilized the IMDb movie reviews dataset, which comprises 50,000 English reviews (25,000 designated for training and 25,000 for testing) with an equal distribution of positive and negative classes. The dataset's unique characteristics, such as spelling errors, varying text lengths, and abbreviations, necessitate a multi-phase and unconventional approach to sentiment analysis. The data was thoroughly preprocessed, which involved eliminating unwanted characters, correcting slang, removing stop words, tokenizing, stemming, and performing part-of-speech tagging. To achieve this, this research implemented two separate word embedding models, GloVe and Word2Vec, for vectorization. In this study, Echo State Network (ESN) has been utilized, as there are two sentiments to consider, including positive and negative. In the following, this network has been optimized using Augmented Water Cycle Algorithm (AWCA), thus enhancing the hyperparameters. It was demonstrated by the outcomes that using GloVe could help the suggested ESN-AWCA accomplish the values of 96.37%, 96.39%, 95.87%, and 96.87% for F1-score, accuracy, recall, and precision, respectively. Moreover, utilizing Word2Vec helped the suggested model accomplish the values of 96.23%, 96.12%, 95.76%, and 96.71% for F1-score, accuracy, recall, and precision, respectively. Overall, the proposed ESN-AWCA model demonstrated strong performance with both word embedding methods and outperformed the other models evaluated in the study. The statistical validation, the p value of 0.001 and effect sizes d > 1.1, demonstrated the superiority of the suggested model.
近年来,使用机器学习进行情感分析越来越受欢迎,并受到了广泛关注。情感分析是一项关键且具有挑战性的任务,需要高精度的网络。本研究使用了IMDb电影评论数据集,该数据集包含50,000条英文评论(25,000条用于训练,25,000条用于测试),正负类别分布均匀。该数据集的独特特征,如拼写错误、文本长度各异和缩写,需要采用多阶段且非常规的情感分析方法。数据经过了全面预处理,包括去除不需要的字符、纠正俚语、去除停用词、分词、词干提取以及词性标注。为实现这一点,本研究实施了两种单独的词嵌入模型GloVe和Word2Vec进行向量化。在本研究中,由于要考虑两种情感,即积极和消极,因此使用了回声状态网络(ESN)。接下来,使用增强水循环算法(AWCA)对该网络进行了优化,从而改进了超参数。结果表明,使用GloVe可以帮助所提出的ESN - AWCA在F1分数、准确率、召回率和精确率方面分别达到96.37%、96.39%、95.87%和96.87%。此外,使用Word2Vec帮助所提出的模型在F1分数、准确率、召回率和精确率方面分别达到96.23%、96.12%、95.76%和96.71%。总体而言,所提出的ESN - AWCA模型在两种词嵌入方法上均表现出强大性能,并且优于研究中评估的其他模型。统计验证中,p值为0.001且效应大小d > 1.1,证明了所提出模型的优越性。