Khaliq Fazli, Shabir Muhammad, Khan Inayat, Ahmad Shafiq, Usman Muhammad, Zubair Muhammad, Huda Shamsul
Department of Computer Science, Islamia College University Peshawar, Peshawar 25000, Pakistan.
Department of Computer Science, University of Buner, Buner 19290, Pakistan.
Sensors (Basel). 2023 Jun 30;23(13):6060. doi: 10.3390/s23136060.
Before the 19th century, all communication and official records relied on handwritten documents, cherished as valuable artefacts by different ethnic groups. While significant efforts have been made to automate the transcription of major languages like English, French, Arabic, and Chinese, there has been less research on regional and minor languages, despite their importance from geographical and historical perspectives. This research focuses on detecting and recognizing Pashto handwritten characters and ligatures, which is essential for preserving this regional cursive language in Pakistan and its status as the national language of Afghanistan. Deep learning techniques were employed to detect and recognize Pashto characters and ligatures, utilizing a newly developed dataset specific to Pashto. A further enhancement was done on the dataset by implementing data augmentation, i.e., scaling and rotation on Pashto handwritten characters and ligatures, which gave us many variations of a single trajectory. Different morphological operations for minimizing gaps in the trajectories were also performed. The median filter was used for the removal of different noises. This dataset will be combined with the existing PHWD-V2 dataset. Various deep-learning techniques were evaluated, including VGG19, MobileNetV2, MobileNetV3, and a customized CNN. The customized CNN demonstrated the highest accuracy and minimal loss, achieving a training accuracy of 93.98%, validation accuracy of 92.08% and testing accuracy of 92.99%.
19世纪以前,所有的通信和官方记录都依赖手写文件,不同民族将其视为珍贵的文物。尽管已经做出了巨大努力来实现英语、法语、阿拉伯语和中文等主要语言的转录自动化,但对于地区性和小语种语言的研究却较少,尽管从地理和历史角度来看它们很重要。这项研究专注于检测和识别普什图语手写字符和连字,这对于在巴基斯坦保存这种地区性草书语言以及其作为阿富汗国语的地位至关重要。采用深度学习技术来检测和识别普什图语字符和连字,利用一个新开发的特定于普什图语的数据集。通过对普什图语手写字符和连字进行数据增强,即缩放和旋转,对数据集进行了进一步增强,这为我们提供了单个轨迹的许多变体。还执行了不同的形态学操作以最小化轨迹中的间隙。使用中值滤波器去除不同的噪声。这个数据集将与现有的PHWD-V2数据集合并。评估了各种深度学习技术,包括VGG19、MobileNetV2、MobileNetV3和一个定制的卷积神经网络(CNN)。定制的CNN表现出最高的准确率和最小的损失,训练准确率达到93.98%,验证准确率达到92.08%,测试准确率达到92.99%。