Khan Majid A, Mohammad Nazeeruddin, Ben Brahim Ghassen, Bashar Abul, Latif Ghazanfar
College of Computer Engineering and Science, Prince Mohammad Bin Fahd University, Khobar, Eastern Province, Saudi Arabia.
PeerJ Comput Sci. 2022 Apr 20;8:e955. doi: 10.7717/peerj-cs.955. eCollection 2022.
Author verification of handwritten text is required in several application domains and has drawn a lot of attention within the research community due to its importance. Though, several approaches have been proposed for the text-independent writer verification of handwritten text, none of these have addressed the problem domain where author verification is sought based on partially-damaged handwritten documents (, during forensic analysis). In this paper, we propose an approach for offline text-independent writer verification of handwritten Arabic text based on individual character shapes (within the Arabic alphabet). The proposed approach enables writer verification for partially damaged documents where certain handwritten characters can still be extracted from the damaged document. We also provide a mechanism to identify which Arabic characters are more effective during the writer verification process. We have collected a new dataset, Arabic Handwritten Alphabet, Words and Paragraphs Per User (AHAWP), for this purpose in a classroom setting with 82 different users. The dataset consists of 53,199 user-written isolated Arabic characters, 8,144 Arabic words, 10,780 characters extracted from these words. Convolutional neural network (CNN) based models are developed for verification of writers based on individual characters with an accuracy of 94% for isolated character shapes and 90% for extracted character shapes. Our proposed approach provided up to 95% writer verification accuracy for partially damaged documents.
在多个应用领域中都需要对手写文本进行作者验证,由于其重要性,该领域在研究界引起了广泛关注。尽管已经提出了几种用于手写文本的独立于文本的作者验证方法,但这些方法都没有解决基于部分受损手写文档(如在法医分析中)进行作者验证的问题领域。在本文中,我们提出了一种基于单个字符形状(在阿拉伯字母表范围内)对手写阿拉伯文本进行离线独立于文本的作者验证的方法。所提出的方法能够对部分受损文档进行作者验证,在这些文档中某些手写字符仍可从受损文档中提取出来。我们还提供了一种机制,以确定在作者验证过程中哪些阿拉伯字符更有效。为此,我们在课堂环境中收集了一个新的数据集,即每个用户的阿拉伯手写字母、单词和段落(AHAWP),涉及82个不同用户。该数据集包含53,199个用户手写的孤立阿拉伯字符、8,144个阿拉伯单词以及从这些单词中提取的10,780个字符。基于卷积神经网络(CNN)的模型被开发用于基于单个字符的作者验证,对于孤立字符形状的验证准确率为94%,对于提取字符形状的验证准确率为90%。我们提出的方法对于部分受损文档的作者验证准确率高达95%。