萨纳德集650K：关于圣训传述者的数据。

Sanadset 650K: Data on Hadith narrators.

作者信息

Mghari Mohammed, Bouras Omar, El Hibaoui Abdelaaziz

机构信息

Abdelmalek Essaâdi University, Faculty of Science, Computer Science Department, P.O. Box. 2121 M'Hannech II, Tetuan, 93030, Morocco.

出版信息

Data Brief. 2022 Aug 17;44:108540. doi: 10.1016/j.dib.2022.108540. eCollection 2022 Oct.

DOI:10.1016/j.dib.2022.108540

PMID:36065202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9440281/

Abstract

The chain of narrators (Sanad) plays a vital role in deciding the authenticity of Islamic hadiths. However, the investigation and validation of such Sanad fully depend on scientists (Hadith Scholars). They ordinarily utilize their acquired knowledge, which in this manner needs a critical sum of exertion and time. Automated Sanad evaluation using machine learning algorithms is the best way to solve this problem. Therefore, a representative Sanad dataset is required. This paper presents a full hadith dataset which is named and is made openly accessible for researchers. corpus contains over 650,986 records collected from 926 historical Arabic books of hadith. This dataset can be used for further investigation and classification of hadiths (Strong/Weak), and narrators (trustworthy/not) using AI techniques, and also it can be used as a linguistic resource tool for Arabic Natural Language Processing. Our dataset is collected from online Hadith sources using data scraping and web crawling. The main contribution of this dataset is the extraction of narrator chains that were originally present in textual form within Hadith books. Each observation in the dataset contains complete information about a specific hadith, such as (original book, number, Hadith text, Matn, list of narrators, and the number of narrators).

摘要

传述世系链（Sanad）在判定伊斯兰教圣训的真实性方面起着至关重要的作用。然而，对这种传述世系链的调查与验证完全依赖于学者（圣训学者）。他们通常运用所积累的知识，而这需要相当多的努力和时间。使用机器学习算法进行传述世系链的自动评估是解决这一问题的最佳途径。因此，需要一个具有代表性的传述世系链数据集。本文呈现了一个完整的圣训数据集，名为，并向研究人员开放获取。该语料库包含从926本阿拉伯圣训历史书籍中收集的超过650,986条记录。这个数据集可用于使用人工智能技术对圣训（强/弱）以及传述者（可信/不可信）进行进一步的调查和分类，并且还可作为阿拉伯语自然语言处理的语言资源工具。我们的数据集是通过数据抓取和网络爬虫从在线圣训来源收集的。这个数据集的主要贡献在于提取了原本以文本形式存在于圣训书籍中的传述者链。数据集中的每个观测值都包含关于特定圣训的完整信息，例如（原书、编号、圣训文本、正文、传述者列表以及传述者数量）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad4d/9440281/3f8345264faf/gr1.jpg

相似文献

Sanadset 650K: Data on Hadith narrators.萨纳德集650K：关于圣训传述者的数据。

Data Brief. 2022 Aug 17;44:108540. doi: 10.1016/j.dib.2022.108540. eCollection 2022 Oct.

Multi-IsnadSet MIS for Sahih Muslim Hadith with chain of narrators, based on multiple ISNAD.基于多个传述线索的《布哈里圣训实录》多传述线索集（MIS），带有传述者链条。

Data Brief. 2024 Apr 23;54:110439. doi: 10.1016/j.dib.2024.110439. eCollection 2024 Jun.

Exploring the relationship between hadith narrators in Book of Bukhari through SPADE algorithm.通过SPADE算法探索《布哈里圣训实录》中圣训传述者之间的关系。

MethodsX. 2022 Sep 9;9:101850. doi: 10.1016/j.mex.2022.101850. eCollection 2022.

SANAD: Single-label Arabic News Articles Dataset for automatic text categorization.SANAD：用于自动文本分类的单标签阿拉伯语新闻文章数据集。

Data Brief. 2019 Jun 4;25:104076. doi: 10.1016/j.dib.2019.104076. eCollection 2019 Aug.

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems.Tashkeela：阿拉伯语标音文本的新型语料库，自动标注系统的数据。

Data Brief. 2017 Feb 3;11:147-151. doi: 10.1016/j.dib.2017.01.011. eCollection 2017 Apr.

Hybrid machine learning approach for Arabic medical web page credibility assessment.混合机器学习方法在阿拉伯医学网页可信度评估中的应用。

Health Informatics J. 2022 Jan-Mar;28(1):14604582211070998. doi: 10.1177/14604582211070998.

A scarce dataset for ancient Arabic handwritten text recognition.用于古代阿拉伯手写文本识别的稀缺数据集。

Data Brief. 2024 Aug 8;56:110813. doi: 10.1016/j.dib.2024.110813. eCollection 2024 Oct.

A comprehensive dataset for Arabic word sense disambiguation.

Data Brief. 2024 Jun 4;55:110591. doi: 10.1016/j.dib.2024.110591. eCollection 2024 Aug.

ArASL: Arabic Alphabets Sign Language Dataset.ArASL：阿拉伯字母手语数据集。

Data Brief. 2019 Feb 23;23:103777. doi: 10.1016/j.dib.2019.103777. eCollection 2019 Apr.

Parallel texts dataset for Uzbek-Kazakh machine translation.乌兹别克语-哈萨克语机器翻译的平行文本数据集。

Data Brief. 2024 Feb 15;53:110194. doi: 10.1016/j.dib.2024.110194. eCollection 2024 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验