Farooqi Aziz Mehmood, Malick Rauf Ahmed Shams, Shaikh Muhammad Shahzad, Akhunzada Adnan
College of Computing and Information Sciences PAF Karachi Institute of Economics and Technology, Karachi, Pakistan.
Department of Computer Science, FAST-National University of Computer and Emerging Sciences, Karachi 75300, Pakistan.
Data Brief. 2024 Apr 23;54:110439. doi: 10.1016/j.dib.2024.110439. eCollection 2024 Jun.
In the Islamic domain, Hadiths hold significant importance, standing as crucial texts following the Holy Quran. Each Hadith contains three main parts: the ISNAD (chain of narrators), TARAF (starting part, often from Prophet Muhammad), and MATN (Hadith content). ISNAD, a chain of narrators involved in transmitting that particular MATN. Hadith scholars determine the trustworthiness of the transmitted MATN by the quality of the ISNAD. The ISNAD's data is available in its original Arabic language, with narrator names transliterated into English. This paper presents the Multi-IsnadSet (MIS), that has great potential to be employed by the social scientist and theologist. A multi-directed graph structure is used to represents the complex interactions among the narrators of Hadith. The MIS dataset represent directed graph which consists of 2092 nodes, representing individual narrators, and 77,797 edges represent the Sanad-Hadith connections. The MIS dataset represents multiple ISNAD of the Hadith based on the Sahih Muslim Hadith book. The dataset was carefully extracted from online multiple Hadith sources using data scraping and web crawling techniques tools, providing extensive Hadith details. Each dataset entry provides a complete view of a specific Hadith, including the original book, Hadith number, textual content (MATN), list of narrators, narrator count, sequence of narrators, and ISNAD count. In this paper, four different tools were designed and constructed for modeling and analyzing narrative network such as python library (NetworkX), powerful graph database Neo4j and two different network analysis tools named Gephi and CytoScape. The Neo4j graph database is used to represent the multi-dimensional graph related data for the ease of extraction and establishing new relationships among nodes. Researchers can use MIS to explore Hadith credibility including classification of Hadiths (Sahih=perfection in the Sanad/Dhaif=imperfection in the Sanad), and narrators (trustworthy/not). Traditionally, scholars have focused on identifying the longest and shortest Sanad between two Narrators, but in MIS, the emphasis shifts to determining the optimum/authentic Sanad, considering narrator qualities. The graph representation of the authentic and manually curated dataset will open ways for the development of computational models that could identify the significance of a chain and a narrator. The dataset allows the researchers to provide Hadith narrators and Hadith ISNAD that could be used in a wide variety of future research studies related to Hadith authentication and rules extraction. Moreover, the dataset encourages cross-disciplinary research, bridging the gap between Islamic studies, artificial intelligence (AI), social network analysis (SNA), and Graph Neural Network (GNN).
在伊斯兰领域,圣训具有重要意义,是仅次于《古兰经》的关键文本。每条圣训包含三个主要部分:传述世系(叙述者链条)、开端(通常源自先知穆罕默德)和正文(圣训内容)。传述世系是参与传递特定正文的叙述者链条。圣训学者通过传述世系的质量来确定所传递正文的可信度。传述世系的数据以阿拉伯语原文提供,叙述者姓名音译为英语。本文介绍了多传述世系集(MIS),它具有很大潜力供社会科学家和神学家使用。使用多向图结构来表示圣训叙述者之间的复杂互动。MIS数据集表示有向图,由2092个节点组成,代表个体叙述者,77797条边表示传述世系与圣训的联系。MIS数据集基于《布哈里圣训实录》呈现圣训的多个传述世系。该数据集是使用数据抓取和网络爬虫技术工具从多个在线圣训来源精心提取的,提供了广泛的圣训细节。每个数据集条目提供了特定圣训的完整视图,包括原始书籍、圣训编号、文本内容(正文)、叙述者列表、叙述者数量、叙述者顺序以及传述世系数量。在本文中,设计并构建了四种不同的工具用于对叙述网络进行建模和分析,如Python库(NetworkX)、强大的图形数据库Neo4j以及两种不同的网络分析工具Gephi和CytoScape。Neo4j图形数据库用于表示多维图形相关数据,便于提取和在节点之间建立新关系。研究人员可以使用MIS来探索圣训的可信度,包括圣训的分类(健全的 = 传述世系完美/虚弱的 = 传述世系不完美)以及叙述者(可信/不可信)。传统上,学者们专注于确定两个叙述者之间最长和最短的传述世系,但在MIS中,重点转向考虑叙述者品质来确定最佳/可靠的传述世系。经过人工整理的可靠数据集的图形表示将为开发能够识别链条和叙述者重要性的计算模型开辟道路。该数据集使研究人员能够提供可用于与圣训认证和规则提取相关的各种未来研究的圣训叙述者和圣训传述世系。此外,该数据集鼓励跨学科研究,弥合伊斯兰研究、人工智能(AI)、社会网络分析(SNA)和图神经网络(GNN)之间的差距。