Syed Sibtain, Khan Khalil, Khan Maqbool, Khan Rehan Ullah, Aloraini Abdulrahman
Department of IT & CS, Pak-Austria Fachhochschule Institute of Applied Sciences and Technology, Haripur, KP, Pakistan.
Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan.
PeerJ Comput Sci. 2024 Jul 11;10:e2124. doi: 10.7717/peerj-cs.2124. eCollection 2024.
Pashtu is one of the most widely spoken languages in south-east Asia. Pashtu Numerics recognition poses challenges due to its cursive nature. Despite this, employing a machine learning-based optical character recognition (OCR) model can be an effective way to tackle this issue. The main aim of the study is to propose an optimized machine learning model which can efficiently identify Pashtu numerics from 0-9. The methodology includes data organizing into different directories each representing labels. After that, the data is preprocessed , images are resized to 32 × 32 images, then they are normalized by dividing their pixel value by 255, and the data is reshaped for model input. The dataset was split in the ratio of 80:20. After this, optimized hyperparameters were selected for LSTM and CNN models with the help of trial-and-error technique. Models were evaluated by accuracy and loss graphs, classification report, and confusion matrix. The results indicate that the proposed LSTM model slightly outperforms the proposed CNN model with a macro-average of precision: 0.9877, recall: 0.9876, F1 score: 0.9876. Both models demonstrate remarkable performance in accurately recognizing Pashtu numerics, achieving an accuracy level of nearly 98%. Notably, the LSTM model exhibits a marginal advantage over the CNN model in this regard.
普什图语是东南亚使用最广泛的语言之一。由于其书写方式为草书,普什图语数字识别面临挑战。尽管如此,采用基于机器学习的光学字符识别(OCR)模型可能是解决这一问题的有效方法。该研究的主要目的是提出一种优化的机器学习模型,该模型能够有效地识别0到9的普什图语数字。方法包括将数据组织到不同的目录中,每个目录代表一个标签。之后,对数据进行预处理,将图像调整为32×32的图像,然后通过将其像素值除以255进行归一化,并对数据进行重塑以用于模型输入。数据集按80:20的比例拆分。在此之后,借助试错技术为长短期记忆网络(LSTM)和卷积神经网络(CNN)模型选择优化的超参数。通过准确率和损失图、分类报告以及混淆矩阵对模型进行评估。结果表明,所提出的LSTM模型略优于所提出的CNN模型,宏平均精度为0.9877,召回率为0.9876,F1分数为0.9876。两个模型在准确识别普什图语数字方面都表现出卓越的性能,准确率达到近98%。值得注意的是,在这方面LSTM模型比CNN模型表现出微弱优势。