Altamimi Mohammed, Alayba Abdulaziz M
Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il, 81481, Saudi Arabia.
Data Brief. 2023 Jul 29;50:109460. doi: 10.1016/j.dib.2023.109460. eCollection 2023 Oct.
In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main.
在本文中,我们展示了一个基于2021年1月1日至2021年12月31日期间收集的阿拉伯语新闻文章的现代标准阿拉伯语数据集。总共从12个阿拉伯语新闻网站收集了超过500,000篇文章,其选择受到多种主题的驱动,包括体育、经济、本地新闻、政治、科技、旅游、娱乐、汽车、健康和艺术。该数据集的开发将使数据科学家能够在自然语言处理领域进行有效探索和实验,并且该数据集还可用于开发机器学习和深度学习模型,以根据主题对文章进行分类。该数据集可在https://github.com/alaybaa/ArabicArticlesDataset/tree/main上下载。