Khalil Ashwaq, Jarrah Moath, Aldwairi Monther, Jaradat Manar
Department of Computer Engineering, Jordan University of Science and Technology, PO Box 3030, Irbid 22110, Jordan.
College of Technological Innovation, Zayed University, Abu Dhabi, UAE.
Data Brief. 2022 Apr 8;42:108141. doi: 10.1016/j.dib.2022.108141. eCollection 2022 Jun.
The news credibility detection task has started to gain more attention recently due to the rapid increase of news on different social media platforms. This article provides a large, labeled, and diverse Arabic Fake News Dataset (AFND) that is collected from public Arabic news websites. This dataset enables the research community to use supervised and unsupervised machine learning algorithms to classify the credibility of Arabic news articles. AFND consists of 606912 public news articles that were scraped from 134 public news websites of 19 different Arab countries over a 6-month period using Python scripts. The Arabic fact-check platform, Misbar, is used manually to classify each public news source into credible, not credible, or undecided. Weak supervision is applied to label news articles with the same label as the public source. AFND is imbalanced in the number of articles in each class. Hence, it is useful for researchers who focus on finding solutions for imbalanced datasets. The dataset is available in JSON format and can be accessed from Mendeley Data repository.
由于不同社交媒体平台上新闻的迅速增加,新闻可信度检测任务最近开始受到更多关注。本文提供了一个从公共阿拉伯语新闻网站收集的大型、有标签且多样化的阿拉伯语假新闻数据集(AFND)。该数据集使研究社区能够使用监督和无监督机器学习算法对阿拉伯语新闻文章的可信度进行分类。AFND由606912篇公共新闻文章组成,这些文章是在6个月内使用Python脚本从19个不同阿拉伯国家的134个公共新闻网站上抓取的。阿拉伯语事实核查平台Misbar被手动用于将每个公共新闻来源分类为可信、不可信或不确定。采用弱监督将新闻文章标记为与公共来源相同的标签。AFND在每个类别的文章数量上是不均衡的。因此,它对专注于为不均衡数据集寻找解决方案的研究人员很有用。该数据集以JSON格式提供,可以从Mendeley数据存储库访问。