Michalak Hubert, Okarma Krzysztof
Faculty of Electrical Engineering, West Pomeranian University of Technology, Szczecin, 70-313 Szczecin, Poland.
Entropy (Basel). 2019 Jun 4;21(6):562. doi: 10.3390/e21060562.
Automatic text recognition from the natural images acquired in uncontrolled lighting conditions is a challenging task due to the presence of shadows hindering the shape analysis and classification of individual characters. Since the optical character recognition methods require prior image binarization, the application of classical global thresholding methods in such case makes it impossible to preserve the visibility of all characters. Nevertheless, the use of adaptive binarization does not always lead to satisfactory results for heavily unevenly illuminated document images. In this paper, the image preprocessing methodology with the use of local image entropy filtering is proposed, allowing for the improvement of various commonly used image thresholding methods, which can be useful also for text recognition purposes. The proposed approach was verified using a dataset of 140 differently illuminated document images subjected to further text recognition. Experimental results, expressed as Levenshtein distances and F-Measure values for obtained text strings, are promising and confirm the usefulness of the proposed approach.
在光照条件不受控制的情况下,从自然图像中进行自动文本识别是一项具有挑战性的任务,因为阴影的存在会阻碍单个字符的形状分析和分类。由于光学字符识别方法需要先进行图像二值化,在这种情况下应用经典的全局阈值化方法无法保留所有字符的可见性。然而,对于光照严重不均匀的文档图像,使用自适应二值化并不总是能得到令人满意的结果。本文提出了一种使用局部图像熵滤波的图像预处理方法,该方法可以改进各种常用的图像阈值化方法,这对于文本识别目的也可能是有用的。使用包含140张不同光照条件的文档图像的数据集进行进一步文本识别,对所提出的方法进行了验证。以得到的文本字符串的莱文斯坦距离和F值表示的实验结果很有前景,并证实了所提出方法的有效性。