Khan Majid Ali
College of Computer Engineering and Science, Prince Mohammad Bin Fahd University, Khobar, Eastern Province, Saudi Arabia.
Data Brief. 2022 Feb 13;41:107947. doi: 10.1016/j.dib.2022.107947. eCollection 2022 Apr.
This article presents a handwritten Arabic alphabets, words and paragraphs dataset (AHAWP). The dataset contains 65 different Arabic alphabets (with variations on begin, end, middle and regular alphabets), 10 different Arabic words (that encompass all Arabic alphabets) and 3 different paragraphs. The dataset was collected anonymously from 82 different users. Each user was asked to write each alphabet and word 10 times. A userid uniquely but anonymously identifies the writer of each alphabet, word and paragraph. In total, the dataset consists of 53199 alphabet images, 8144 words images and 241 paragraphs images. This dataset can be used for multiple purposes. It can be used for optical handwriting recognition of alphabets and words. It can also be used for writer identification (or verification) of handwritten Arabic text. It is also possible to evaluate difference in writing styles of isolated alphabets as compared to the same alphabet written as part of the word or in paragraph by the same user using this dataset. The dataset is publicly available at https://data.mendeley.com/datasets/2h76672znt/1.
本文介绍了一个手写阿拉伯字母、单词和段落数据集(AHAWP)。该数据集包含65种不同的阿拉伯字母(包括开头、结尾、中间和常规字母的变体)、10个不同的阿拉伯单词(涵盖所有阿拉伯字母)和3个不同的段落。该数据集是从82个不同用户处匿名收集的。要求每个用户将每个字母和单词书写10次。一个用户ID唯一地但匿名地标识每个字母、单词和段落的书写者。该数据集总共由53199个字母图像、8144个单词图像和241个段落图像组成。这个数据集可用于多种目的。它可用于字母和单词的光学手写识别。它还可用于手写阿拉伯文本的书写者识别(或验证)。使用这个数据集,还可以评估孤立字母与同一用户在单词或段落中书写的相同字母的书写风格差异。该数据集可在https://data.mendeley.com/datasets/2h76672znt/1上公开获取。