斯隆：用于场景文本识别的尺度自适应方向注意网络。

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.

出版信息

IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.

DOI:10.1109/TIP.2020.3045602

Abstract

Scene text recognition, the final step of the scene text reading system, has made impressive progress based on deep neural networks. However, existing recognition methods devote to dealing with the geometrically regular or irregular scene text. They are limited to the semantically arbitrary-orientation scene text. Meanwhile, previous scene text recognizers usually learn the single-scale feature representations for various-scale characters, which cannot model effective contexts for different characters. In this paper, we propose a novel scale-adaptive orientation attention network for arbitrary-orientation scene text recognition, which consists of a dynamic log-polar transformer and a sequence recognition network. Specifically, the dynamic log-polar transformer learns the log-polar origin to adaptively convert the arbitrary rotations and scales of scene texts into the shifts in the log-polar space, which is helpful to generate the rotation-aware and scale-aware visual representation. Next, the sequence recognition network is an encoder-decoder model, which incorporates a novel character-level receptive field attention module to encode more valid contexts for various-scale characters. The whole architecture can be trained in an end-to-end manner, only requiring the word image and its corresponding ground-truth text. Extensive experiments on several public datasets have demonstrated the effectiveness and superiority of our proposed method.

摘要

场景文本识别是场景文本阅读系统的最后一步，基于深度神经网络取得了令人瞩目的进展。然而，现有的识别方法致力于处理几何规则或不规则的场景文本，它们仅限于语义上任意方向的场景文本。同时，以前的场景文本识别器通常学习用于各种尺度字符的单一尺度特征表示，无法为不同字符建模有效的上下文。在本文中，我们提出了一种新颖的用于任意方向场景文本识别的尺度自适应方向注意力网络，它由动态对数极坐标转换器和序列识别网络组成。具体来说，动态对数极坐标转换器学习对数极坐标原点，自适应地将场景文本的任意旋转和尺度转换为对数极坐标空间中的移位，这有助于生成旋转感知和尺度感知的视觉表示。接下来，序列识别网络是一个编码器-解码器模型，它结合了一种新颖的字符级感受野注意力模块，为各种尺度的字符编码更多有效的上下文。整个架构可以端到端训练，只需要单词图像及其对应的地面真实文本。在几个公共数据集上的广泛实验表明了我们提出的方法的有效性和优越性。

相似文献

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.斯隆：用于场景文本识别的尺度自适应方向注意网络。

IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.

Attention Guided Feature Encoding for Scene Text Recognition.用于场景文本识别的注意力引导特征编码

J Imaging. 2022 Oct 8;8(10):276. doi: 10.3390/jimaging8100276.

An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors.基于文本位置校正和编解码器网络的视觉传感器场景图像文本识别算法。

Sensors (Basel). 2020 May 22;20(10):2942. doi: 10.3390/s20102942.

Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition.用于精确场景文本识别的图像到字符再到单词的变换器

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12908-12921. doi: 10.1109/TPAMI.2022.3230962. Epub 2023 Oct 3.

Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition.基于可解释的连接主义时间分类的场景文本识别

J Imaging. 2023 Nov 15;9(11):248. doi: 10.3390/jimaging9110248.

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.ABINet++：面向场景文本定位的自主、双向和迭代语言建模。

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7123-7141. doi: 10.1109/TPAMI.2022.3223908. Epub 2023 May 5.

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本：用于自然场景图像中乌尔都语文本端到端识别的综合数据集。

Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.

Towards End-to-End Text Spotting in Natural Scenes.面向自然场景的端到端文本检测。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7266-7281. doi: 10.1109/TPAMI.2021.3095916. Epub 2022 Sep 14.

TextField: Learning a Deep Direction Field for Irregular Scene Text Detection.文本字段：学习用于不规则场景文本检测的深度方向场。

IEEE Trans Image Process. 2019 Nov;28(11):5566-5579. doi: 10.1109/TIP.2019.2900589. Epub 2019 Feb 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

斯隆：用于场景文本识别的尺度自适应方向注意网络。

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献