基于注意力机制的卷积神经网络场景文本检测方法

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.

最近的深度学习模型在对自然图像中的文本和非文本成分进行分类方面表现出了强大的能力。它们从整个图像组件（补丁）全局计算提取高级特征，其中杂乱的背景信息可能会主导深层表示中的真实文本特征。这导致了较差的辨别能力和较差的鲁棒性。在本文中，我们通过提出一种新的文本注意卷积神经网络（Text-CNN）来提出一种新的系统来进行场景文本检测，该网络特别关注从图像组件中提取与文本相关的区域和特征。我们开发了一种新的学习机制，使用多级和丰富的监督信息来训练 Text-CNN，包括文本区域掩模、字符标签和二进制文本/非文本信息。丰富的监督信息使 Text-CNN 具有区分模糊文本的强大能力，并且还提高了其对复杂背景组件的鲁棒性。训练过程被公式化为多任务学习问题，其中低级监督信息极大地促进了文本/非文本分类的主要任务。此外，还开发了一种强大的低级检测器，称为对比度增强最大稳定极值区域（MSERs），它通过增强文本模式和背景之间的强度对比度来扩展广泛使用的 MSERs。这使得它能够检测极具挑战性的文本模式，从而提高了召回率。我们的方法在 ICDAR 2013 数据集上取得了有希望的结果，F 度量为 0.82，大大提高了现有技术的结果。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Text-Attentional Convolutional Neural Network for Scene Text Detection.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

推荐工具