National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China.
IEEE Trans Image Process. 2011 Mar;20(3):800-13. doi: 10.1109/TIP.2010.2070803. Epub 2010 Sep 2.
Text detection and localization in natural scene images is important for content-based image analysis. This problem is challenging due to the complex background, the non-uniform illumination, the variations of text font, size and line orientation. In this paper, we present a hybrid approach to robustly detect and localize texts in natural scene images. A text region detector is designed to estimate the text existing confidence and scale information in image pyramid, which help segment candidate text components by local binarization. To efficiently filter out the non-text components, a conditional random field (CRF) model considering unary component properties and binary contextual component relationships with supervised parameter learning is proposed. Finally, text components are grouped into text lines/words with a learning-based energy minimization method. Since all the three stages are learning-based, there are very few parameters requiring manual tuning. Experimental results evaluated on the ICDAR 2005 competition dataset show that our approach yields higher precision and recall performance compared with state-of-the-art methods. We also evaluated our approach on a multilingual image dataset with promising results.
文本检测和定位在自然场景图像中对于基于内容的图像分析非常重要。由于复杂的背景、不均匀的光照、文本字体、大小和行方向的变化,这个问题具有挑战性。在本文中,我们提出了一种混合方法来稳健地检测和定位自然场景图像中的文本。设计了一个文本区域检测器来估计图像金字塔中存在的文本置信度和尺度信息,这有助于通过局部二值化分割候选文本组件。为了有效地过滤掉非文本组件,提出了一种考虑一元组件属性和二元上下文组件关系的条件随机场(CRF)模型,并进行了有监督的参数学习。最后,使用基于学习的能量最小化方法将文本组件组合成文本行/单词。由于所有三个阶段都是基于学习的,因此需要手动调整的参数很少。在 ICDAR 2005 竞赛数据集上的实验结果表明,与最先进的方法相比,我们的方法具有更高的精度和召回性能。我们还在一个多语言图像数据集上评估了我们的方法,取得了有前景的结果。