基于深度学习和粒子群优化的视障人士文本识别与鉴定技术

Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification.

作者信息

Pandey Binay Kumar, Pandey Digvijay, Wariya Subodh, Aggarwal Gaurav, Rastogi Rahul

机构信息

Department of Information Technology, College of Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand India.

Department of Computer Science and Engineering, Invertis University, Bareilly, India.

出版信息

Augment Hum Res. 2021;6(1):14. doi: 10.1007/s41133-021-00051-5. Epub 2021 Oct 29.

DOI:10.1007/s41133-021-00051-5

PMID:40477829

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8553597/

Abstract

Blind people can benefit greatly from a system capable of localising and reading comprehension text embedded in natural scenes and providing useful information that boosts their self-esteem and autonomy in everyday situations. Regardless of the fact that existing optical character recognition programmes seem to be quick and effective, the majority of them are not able to correctly recognise text embedded in usual panorama images. The methodology described in this paper is to localise textual image regions and pre-process them using the naïve Bayesian algorithm. A weighted reading technique is used to generate the correct text data from the complicated image regions. Usually, images hold some disturbance as a result of the fact that filtration is proposed during the early pre-processing step. To restore the image's quality, the input image is processed employing gradient and contrast image methods. Following that, the contrast of the source images would be enhanced using an adaptive image map. The stroke width transform, Gabor's transform, and weighted naïve Bayesian classifier methodologies have been used in complicated degraded images to segment, feature extraction, and detect textual and non-textual elements. Finally, to identify categorised textual data, the confluence of deep neural networks and particle swarm optimisation is being used. The text in the image is transformed into an acoustic output after identification. The dataset IIIT5K is used for the development portion, and the performance of the suggested come up is evaluated using parameters such as accuracy, recall, precision, and F1-score.

摘要

盲人能够从一个能够定位并理解嵌入自然场景中的文本、并提供有助于提升他们在日常情境中的自尊和自主性的有用信息的系统中大大受益。尽管现有的光学字符识别程序似乎快速且有效，但其中大多数无法正确识别嵌入在普通全景图像中的文本。本文所描述的方法是使用朴素贝叶斯算法定位文本图像区域并对其进行预处理。一种加权读取技术被用于从复杂的图像区域生成正确的文本数据。通常，由于在早期预处理步骤中提出了过滤，图像会存在一些干扰。为了恢复图像质量，采用梯度和对比度图像方法对输入图像进行处理。随后，使用自适应图像映射增强源图像的对比度。在复杂的退化图像中，笔画宽度变换、加博尔变换和加权朴素贝叶斯分类器方法已被用于分割、特征提取以及检测文本和非文本元素。最后，为了识别分类后的文本数据，正在使用深度神经网络和粒子群优化的融合方法。图像中的文本在识别后被转换为语音输出。数据集IIIT5K用于开发部分，并使用准确率、召回率、精确率和F1分数等参数评估所提出方法的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于深度学习和粒子群优化的视障人士文本识别与鉴定技术

Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于深度学习和粒子群优化的视障人士文本识别与鉴定技术

Deep Learning and Particle Swarm Optimisation-Based Techniques for Visually Impaired Humans' Text Recognition and Identification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献