Bahadar Khan, Ahmad Riaz, Aurangzeb Khursheed, Muhammad Siraj, Ullah Khalil, Hussain Ibrar, Syed Ikram, Shahid Anwar Muhammad
Department of Computer Science, Shaheed Benazir Bhutto University, Sheringal, Pakistan.
Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
PeerJ Comput Sci. 2024 Jul 26;10:e2089. doi: 10.7717/peerj-cs.2089. eCollection 2024.
Layout analysis is the main component of a typical Document Image Analysis (DIA) system and plays an important role in pre-processing. However, regarding the Pashto language, the document images have not been explored so far. This research, for the first time, examines Pashto text along with graphics and proposes a deep learning-based classifier that can detect Pashto text and graphics per document. Another notable contribution of this research is the creation of a real dataset, which contains more than 1,000 images of the Pashto documents captured by a camera. For this dataset, we applied the convolution neural network (CNN) following a deep learning technique. Our intended method is based on the development of the advanced and classical variant of Faster R-CNN called Single-Shot Detector (SSD). The evaluation was performed by examining the 300 images from the test set. Through this way, we achieved a mean average precision (mAP) of 84.90%.
版面分析是典型文档图像分析(DIA)系统的主要组成部分,在预处理中起着重要作用。然而,就普什图语而言,迄今为止尚未对文档图像进行过探索。本研究首次对普什图文本文档及图形进行了研究,并提出了一种基于深度学习的分类器,该分类器可以检测每份文档中的普什图文本文档及图形。本研究的另一个显著贡献是创建了一个真实数据集,其中包含通过相机拍摄的1000多张普什图语文档图像。对于这个数据集,我们采用深度学习技术应用了卷积神经网络(CNN)。我们预期的方法基于名为单阶段检测器(SSD)的Faster R-CNN高级经典变体的开发。通过检查测试集中的300张图像进行评估。通过这种方式,我们获得了84.90%的平均精度均值(mAP)。