Suppr超能文献

斯隆:用于场景文本识别的尺度自适应方向注意网络。

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.

出版信息

IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.

Abstract

Scene text recognition, the final step of the scene text reading system, has made impressive progress based on deep neural networks. However, existing recognition methods devote to dealing with the geometrically regular or irregular scene text. They are limited to the semantically arbitrary-orientation scene text. Meanwhile, previous scene text recognizers usually learn the single-scale feature representations for various-scale characters, which cannot model effective contexts for different characters. In this paper, we propose a novel scale-adaptive orientation attention network for arbitrary-orientation scene text recognition, which consists of a dynamic log-polar transformer and a sequence recognition network. Specifically, the dynamic log-polar transformer learns the log-polar origin to adaptively convert the arbitrary rotations and scales of scene texts into the shifts in the log-polar space, which is helpful to generate the rotation-aware and scale-aware visual representation. Next, the sequence recognition network is an encoder-decoder model, which incorporates a novel character-level receptive field attention module to encode more valid contexts for various-scale characters. The whole architecture can be trained in an end-to-end manner, only requiring the word image and its corresponding ground-truth text. Extensive experiments on several public datasets have demonstrated the effectiveness and superiority of our proposed method.

摘要

场景文本识别是场景文本阅读系统的最后一步,基于深度神经网络取得了令人瞩目的进展。然而,现有的识别方法致力于处理几何规则或不规则的场景文本,它们仅限于语义上任意方向的场景文本。同时,以前的场景文本识别器通常学习用于各种尺度字符的单一尺度特征表示,无法为不同字符建模有效的上下文。在本文中,我们提出了一种新颖的用于任意方向场景文本识别的尺度自适应方向注意力网络,它由动态对数极坐标转换器和序列识别网络组成。具体来说,动态对数极坐标转换器学习对数极坐标原点,自适应地将场景文本的任意旋转和尺度转换为对数极坐标空间中的移位,这有助于生成旋转感知和尺度感知的视觉表示。接下来,序列识别网络是一个编码器-解码器模型,它结合了一种新颖的字符级感受野注意力模块,为各种尺度的字符编码更多有效的上下文。整个架构可以端到端训练,只需要单词图像及其对应的地面真实文本。在几个公共数据集上的广泛实验表明了我们提出的方法的有效性和优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验