Suppr超能文献

一种基于机器学习的方法用于对阿拉伯语推文的远程学习进行情感分析。

A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets.

作者信息

Almalki Jameel

机构信息

Department of Computer Science, College of Computer in Al-Leith, Umm Al-Qura University, Makkah, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2022 Jul 26;8:e1047. doi: 10.7717/peerj-cs.1047. eCollection 2022.

Abstract

Social media platforms such as Twitter, YouTube, Instagram and Facebook are leading sources of large datasets nowadays. Twitter's data is one of the most reliable due to its privacy policy. Tweets have been used for sentiment analysis and to identify meaningful information within the dataset. Our study focused on the distance learning domain in Saudi Arabia by analyzing Arabic tweets about distance learning. This work proposes a model for analyzing people's feedback using a Twitter dataset in the distance learning domain. The proposed model is based on the Apache Spark product to manage the large dataset. The proposed model uses the Twitter API to get the tweets as raw data. These tweets were stored in the Apache Spark server. A regex-based technique for preprocessing removed retweets, links, hashtags, English words and numbers, usernames, and emojis from the dataset. After that, a Logistic-based Regression model was trained on the pre-processed data. This Logistic Regression model, from the field of machine learning, was used to predict the sentiment inside the tweets. Finally, a Flask application was built for sentiment analysis of the Arabic tweets. The proposed model gives better results when compared to various applied techniques. The proposed model is evaluated on test data to calculate Accuracy, F1 Score, Precision, and Recall, obtaining scores of 91%, 90%, 90%, and 89%, respectively.

摘要

如今,推特、优兔、照片墙和脸书等社交媒体平台是大型数据集的主要来源。由于推特的隐私政策,其数据是最可靠的数据之一。推文已被用于情感分析以及识别数据集中有意义的信息。我们的研究通过分析关于远程学习的阿拉伯语推文,聚焦于沙特阿拉伯的远程学习领域。这项工作提出了一个在远程学习领域使用推特数据集来分析人们反馈的模型。所提出的模型基于Apache Spark产品来管理大型数据集。该模型使用推特应用程序编程接口获取推文作为原始数据。这些推文存储在Apache Spark服务器中。一种基于正则表达式的预处理技术从数据集中删除了转发、链接、主题标签、英语单词和数字、用户名以及表情符号。之后,在预处理后的数据上训练了一个基于逻辑回归的模型。这个来自机器学习领域的逻辑回归模型被用于预测推文中的情感。最后,构建了一个用于对阿拉伯语推文进行情感分析的Flask应用程序。与各种应用技术相比,所提出的模型给出了更好的结果。在测试数据上对所提出的模型进行评估,以计算准确率、F1分数、精确率和召回率,分别获得了91%、90%、90%和89%的分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/187f/9454973/2ce480c70eab/peerj-cs-08-1047-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验