• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VERA-ARAB:通过构建用于真实性分析的平衡新闻数据集来揭示阿拉伯语推文的可信度。

VERA-ARAB: unveiling the Arabic tweets credibility by constructing balanced news dataset for veracity analysis.

作者信息

Mostafa Mohamed A, Almogren Ahmad

机构信息

Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

Chair of Cyber Security, Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2024 Oct 30;10:e2432. doi: 10.7717/peerj-cs.2432. eCollection 2024.

DOI:10.7717/peerj-cs.2432
PMID:39650406
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11623204/
Abstract

The proliferation of fake news on social media platforms necessitates the development of reliable datasets for effective fake news detection and veracity analysis. In this article, we introduce a veracity dataset of Arabic tweets called "VERA-ARAB", a pioneering large-scale dataset designed to enhance fake news detection in Arabic tweets. VERA-ARAB is a balanced, multi-domain, and multi-dialectal dataset, containing both fake and true news, meticulously verified by fact-checking experts from Misbar. Comprising approximately 20,000 tweets from 13,000 distinct users and covering 884 different claims, the dataset includes detailed information such as news text, user details, and spatiotemporal data, spanning diverse domains like sports and politics. We leveraged the X API to retrieve and structure the dataset, providing a comprehensive data dictionary to describe the raw data and conducting a thorough statistical descriptive analysis. This analysis reveals insightful patterns and distributions, visualized according to data type and nature. We also evaluated the dataset using multiple machine learning classification models, exploring various social and textual features. Our findings indicate promising results, particularly with textual features, underscoring the dataset's potential for enhancing fake news detection. Furthermore, we outline future work aimed at expanding VERA-ARAB to establish it as a benchmark for Arabic tweets in fake news detection. We also discuss other potential applications that could leverage the VERA-ARAB dataset, emphasizing its value and versatility for advancing the field of fake news detection in Arabic social media. Potential applications include user veracity assessment, topic modeling, and named entity recognition, demonstrating the dataset's wide-ranging utility for broader research in information quality management on social media.

摘要

社交媒体平台上虚假新闻的泛滥使得有必要开发可靠的数据集,以进行有效的虚假新闻检测和真实性分析。在本文中,我们介绍了一个名为“VERA-ARAB”的阿拉伯语推文真实性数据集,这是一个开创性的大规模数据集,旨在增强对阿拉伯语推文中虚假新闻的检测。VERA-ARAB是一个平衡的、多领域的、多方言的数据集,包含虚假新闻和真实新闻,均经过Misbar的事实核查专家精心验证。该数据集包含来自13000个不同用户的约20000条推文,涵盖884个不同的声明,包括新闻文本、用户详细信息和时空数据等详细信息,涉及体育和政治等不同领域。我们利用X API检索和构建数据集,提供全面的数据字典来描述原始数据,并进行了全面的统计描述分析。该分析揭示了有洞察力的模式和分布,并根据数据类型和性质进行了可视化。我们还使用多个机器学习分类模型对数据集进行了评估,探索了各种社会和文本特征。我们的研究结果显示出有希望的结果,特别是在文本特征方面,突出了该数据集在增强虚假新闻检测方面的潜力。此外,我们概述了未来的工作,旨在扩展VERA-ARAB,将其确立为阿拉伯语推文虚假新闻检测的基准。我们还讨论了其他可以利用VERA-ARAB数据集的潜在应用,强调了其在推进阿拉伯社交媒体虚假新闻检测领域的价值和通用性。潜在应用包括用户真实性评估、主题建模和命名实体识别,展示了该数据集在社交媒体信息质量管理更广泛研究中的广泛用途。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/f09c8be90572/peerj-cs-10-2432-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/d694e1ccdf28/peerj-cs-10-2432-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/fdffd8aa78d9/peerj-cs-10-2432-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/1bd32d535ea6/peerj-cs-10-2432-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/cb82334ec1c0/peerj-cs-10-2432-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/b3c88a9d4648/peerj-cs-10-2432-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/72954ed57024/peerj-cs-10-2432-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/f4b988777934/peerj-cs-10-2432-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/2ec046bf0ca0/peerj-cs-10-2432-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/140467876ffd/peerj-cs-10-2432-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/c0fe8fd71312/peerj-cs-10-2432-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/683ebf88ec8b/peerj-cs-10-2432-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/e407a5413c65/peerj-cs-10-2432-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/36400d482145/peerj-cs-10-2432-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/c2ea8ba1d505/peerj-cs-10-2432-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/7e86690b10db/peerj-cs-10-2432-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/25749a240381/peerj-cs-10-2432-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/7743ca003172/peerj-cs-10-2432-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/f09c8be90572/peerj-cs-10-2432-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/d694e1ccdf28/peerj-cs-10-2432-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/fdffd8aa78d9/peerj-cs-10-2432-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/1bd32d535ea6/peerj-cs-10-2432-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/cb82334ec1c0/peerj-cs-10-2432-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/b3c88a9d4648/peerj-cs-10-2432-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/72954ed57024/peerj-cs-10-2432-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/f4b988777934/peerj-cs-10-2432-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/2ec046bf0ca0/peerj-cs-10-2432-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/140467876ffd/peerj-cs-10-2432-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/c0fe8fd71312/peerj-cs-10-2432-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/683ebf88ec8b/peerj-cs-10-2432-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/e407a5413c65/peerj-cs-10-2432-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/36400d482145/peerj-cs-10-2432-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/c2ea8ba1d505/peerj-cs-10-2432-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/7e86690b10db/peerj-cs-10-2432-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/25749a240381/peerj-cs-10-2432-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/7743ca003172/peerj-cs-10-2432-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5880/11623204/f09c8be90572/peerj-cs-10-2432-g018.jpg

相似文献

1
VERA-ARAB: unveiling the Arabic tweets credibility by constructing balanced news dataset for veracity analysis.VERA-ARAB:通过构建用于真实性分析的平衡新闻数据集来揭示阿拉伯语推文的可信度。
PeerJ Comput Sci. 2024 Oct 30;10:e2432. doi: 10.7717/peerj-cs.2432. eCollection 2024.
2
AFND: Arabic fake news dataset for the detection and classification of articles credibility.AFND:用于检测和分类文章可信度的阿拉伯语虚假新闻数据集。
Data Brief. 2022 Apr 8;42:108141. doi: 10.1016/j.dib.2022.108141. eCollection 2022 Jun.
3
Arabic Fake News Detection Based on Textual Analysis.基于文本分析的阿拉伯语假新闻检测
Arab J Sci Eng. 2022;47(8):10453-10469. doi: 10.1007/s13369-021-06449-y. Epub 2022 Feb 11.
4
IFND: a benchmark dataset for fake news detection.IFND:一个用于假新闻检测的基准数据集。
Complex Intell Systems. 2023;9(3):2843-2863. doi: 10.1007/s40747-021-00552-1. Epub 2021 Oct 16.
5
Arabic fake news detection based on deep contextualized embedding models.基于深度上下文嵌入模型的阿拉伯语假新闻检测
Neural Comput Appl. 2022;34(18):16019-16032. doi: 10.1007/s00521-022-07206-4. Epub 2022 May 3.
6
Dataset for multimodal fake news detection and verification tasks.用于多模态假新闻检测与验证任务的数据集。
Data Brief. 2024 Apr 16;54:110440. doi: 10.1016/j.dib.2024.110440. eCollection 2024 Jun.
7
Multi-label multi-class COVID-19 Arabic Twitter dataset with fine-grained misinformation and situational information annotations.具有细粒度错误信息和情境信息注释的多标签多类别新冠疫情阿拉伯语推特数据集
PeerJ Comput Sci. 2022 Dec 5;8:e1151. doi: 10.7717/peerj-cs.1151. eCollection 2022.
8
Fake news detection: state-of-the-art review and advances with attention to Arabic language aspects.假新闻检测:最新技术综述及对阿拉伯语相关方面进展的关注
PeerJ Comput Sci. 2025 Mar 11;11:e2693. doi: 10.7717/peerj-cs.2693. eCollection 2025.
9
Dissecting the infodemic: An in-depth analysis of COVID-19 misinformation detection on X (formerly Twitter) utilizing machine learning and deep learning techniques.剖析信息疫情:利用机器学习和深度学习技术对X(原推特)上新冠疫情错误信息检测的深入分析。
Heliyon. 2024 Sep 12;10(18):e37760. doi: 10.1016/j.heliyon.2024.e37760. eCollection 2024 Sep 30.
10
A veracity dissemination consistency-based few-shot fake news detection framework by synergizing adversarial and contrastive self-supervised learning.一种基于真实性传播一致性的少样本假新闻检测框架,通过协同对抗性和对比性自监督学习实现。
Sci Rep. 2024 Aug 22;14(1):19470. doi: 10.1038/s41598-024-70039-9.

本文引用的文献

1
Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation.通过非平行语料库改进低资源语言的神经机器翻译:以埃及方言到现代标准阿拉伯语的翻译为例
Sci Rep. 2024 Jan 27;14(1):2265. doi: 10.1038/s41598-023-51090-4.
2
Arabic fake news detection based on deep contextualized embedding models.基于深度上下文嵌入模型的阿拉伯语假新闻检测
Neural Comput Appl. 2022;34(18):16019-16032. doi: 10.1007/s00521-022-07206-4. Epub 2022 May 3.
3
AFND: Arabic fake news dataset for the detection and classification of articles credibility.
AFND:用于检测和分类文章可信度的阿拉伯语虚假新闻数据集。
Data Brief. 2022 Apr 8;42:108141. doi: 10.1016/j.dib.2022.108141. eCollection 2022 Jun.
4
Arabic Fake News Detection Based on Textual Analysis.基于文本分析的阿拉伯语假新闻检测
Arab J Sci Eng. 2022;47(8):10453-10469. doi: 10.1007/s13369-021-06449-y. Epub 2022 Feb 11.
5
FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media.假新闻网:一个具有新闻内容、社交背景和时空信息的数据资源库,用于研究社交媒体上的假新闻。
Big Data. 2020 Jun;8(3):171-188. doi: 10.1089/big.2020.0062.
6
Gradient boosting machines, a tutorial.梯度提升机,教程。
Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. eCollection 2013.
7
Logistic regression and artificial neural network classification models: a methodology review.逻辑回归与人工神经网络分类模型:方法学综述
J Biomed Inform. 2002 Oct-Dec;35(5-6):352-9. doi: 10.1016/s1532-0464(03)00034-0.