Suppr超能文献

基于残差的多风格阿拉伯语变音符文本认证方法。

Residual-based approach for authenticating pattern of multi-style diacritical Arabic texts.

机构信息

Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia.

Faculty of Computer Science, Taibah University, Madinah, Saudi Arabia.

出版信息

PLoS One. 2018 Jun 20;13(6):e0198284. doi: 10.1371/journal.pone.0198284. eCollection 2018.

Abstract

Arabic script is highly sensitive to changes in meaning with respect to the accurate arrangement of diacritics and other related symbols. The most sensitive Arabic text available online is the Digital Qur'an, the sacred book of Revelation in Islam that all Muslims including non-Arabs recite as part of their worship. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashida (extended letters) and other symbols, it is written and available in different styles like Kufi, Naskh, Thuluth, Uthmani, etc. As social media has become part of our daily life, posting downloaded Qur'anic verses from the web is common. This leads to the problem of authenticating the selected Qur'anic passages available in different styles. This paper presents a residual approach for authenticating Uthmani and plain Qur'an verses using one common database. Residual (difference) is obtained by analyzing the differences between Uthmani and plain Quranic styles using XOR operation. Based on predefined data, the proposed approach converts Uthmani text into plain text. Furthermore, we propose to use the Tuned BM algorithm (BMT) exact pattern matching algorithm to verify the substituted Uthmani verse with a given database of plain Qur'anic style. Experimental results show that the proposed approach is useful and effective in authenticating multi-style texts of the Qur'an with 87.1% accuracy.

摘要

阿拉伯语文字对重音符号和其他相关符号的准确排列变化非常敏感。在线上可获取的最敏感的阿拉伯语文本是《古兰经》,这是伊斯兰教的启示圣书,所有穆斯林,包括非阿拉伯人,都将其作为礼拜的一部分进行诵读。由于阿拉伯字母(如重音符号、延长字母)和其他符号的不同特点,它以不同的风格书写和呈现,如库菲、纳斯克、图卢斯、乌斯曼等。随着社交媒体成为我们日常生活的一部分,从网络上发布下载的《古兰经》经文变得很常见。这导致了验证不同风格的《古兰经》经文的问题。本文提出了一种基于残差的方法,使用一个通用数据库来验证乌斯曼和普通《古兰经》经文的真伪。残差(差异)是通过 XOR 操作分析乌斯曼和普通《古兰经》风格之间的差异得到的。基于预定义的数据,所提出的方法将乌斯曼文本转换为普通文本。此外,我们建议使用 Tuned BM 算法(BMT)精确模式匹配算法,用给定的普通《古兰经》风格数据库来验证替代的乌斯曼经文。实验结果表明,该方法在验证《古兰经》的多风格文本方面是有用和有效的,准确率达到 87.1%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f89/6010264/f97a2e6915b6/pone.0198284.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验