Das Rajesh Kumar, Islam Mirajul, Hasan Md Mahmudul, Razia Sultana, Hassan Mocksidul, Khushbu Sharun Akter
Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh.
Faculty of Graduate Studies, Daffodil International University, Dhaka 1341, Bangladesh.
Heliyon. 2023 Sep 19;9(9):e20281. doi: 10.1016/j.heliyon.2023.e20281. eCollection 2023 Sep.
This research paper investigates the efficacy of various machine learning models, including deep learning and hybrid models, for text classification in the English and Bangla languages. The study focuses on sentiment analysis of comments from a popular Bengali e-commerce site, "DARAZ," which comprises both Bangla and translated English reviews. The primary objective of this study is to conduct a comparative analysis of various models, evaluating their efficacy in the domain of sentiment analysis. The research methodology includes implementing seven machine learning models and deep learning models, such as Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Convolutional 1D (Conv1D), and a combined Conv1D-LSTM. Preprocessing techniques are applied to a modified text set to enhance model accuracy. The major conclusion of the study is that Support Vector Machine (SVM) models exhibit superior performance compared to other models, achieving an accuracy of 82.56% for English text sentiment analysis and 86.43% for Bangla text sentiment analysis using the porter stemming algorithm. Additionally, the Bi-LSTM Based Model demonstrates the best performance among the deep learning models, achieving an accuracy of 78.10% for English text and 83.72% for Bangla text using porter stemming. This study signifies significant progress in natural language processing research, particularly for Bangla, by enhancing improved text classification models and methodologies. The results of this research make a significant contribution to the field of sentiment analysis and offer valuable insights for future research and practical applications.
本研究论文探讨了包括深度学习模型和混合模型在内的各种机器学习模型在英文和孟加拉语文本分类中的有效性。该研究聚焦于对一家热门孟加拉语电子商务网站“Daraz”上的评论进行情感分析,该网站既有孟加拉语评论,也有英文翻译评论。本研究的主要目的是对各种模型进行比较分析,评估它们在情感分析领域的有效性。研究方法包括实现七种机器学习模型和深度学习模型,如长短期记忆网络(LSTM)、双向长短期记忆网络(Bi-LSTM)、一维卷积网络(Conv1D)以及组合的Conv1D-LSTM。预处理技术应用于经过修改的文本集以提高模型准确性。该研究的主要结论是,支持向量机(SVM)模型相比其他模型表现更优,使用波特词干提取算法时,英文文本情感分析的准确率达到82.56%,孟加拉语文本情感分析的准确率达到86.43%。此外,基于Bi-LSTM的模型在深度学习模型中表现最佳,使用波特词干提取算法时,英文文本的准确率达到78.10%,孟加拉语文本的准确率达到83.72%。本研究通过改进文本分类模型和方法,在自然语言处理研究中,尤其是在孟加拉语方面取得了显著进展。本研究结果对情感分析领域做出了重大贡献,并为未来研究和实际应用提供了有价值的见解。