通过阅读顺序估计和动态采样实现类逆对抗场景文本识别

Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling.

作者信息

Zhang Shi-Xue, Yang Chun, Zhu Xiaobin, Zhou Hongyang, Wang Hongfa, Yin Xu-Cheng

出版信息

IEEE Trans Image Process. 2024;33:825-839. doi: 10.1109/TIP.2024.3352399. Epub 2024 Jan 19.

DOI:10.1109/TIP.2024.3352399

Abstract

Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss ( L ) consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.

摘要

场景文本识别是一项具有挑战性的任务，特别是对于类似反向的场景文本，其布局复杂，例如镜像、对称或反向弯曲。在本文中，我们提出了一个统一的端到端可训练的类似反向对抗文本识别框架，称为IATS，它可以有效地识别类似反向的场景文本，而不牺牲一般的文本。具体来说，我们提出了一种创新的阅读顺序估计模块（REM），它从初始边界模块（IBM）生成的初始文本边界中提取阅读顺序信息。为了优化和训练REM，我们提出了一种联合阅读顺序估计损失（L），它由分类损失、正交性损失和分布损失组成。借助IBM，我们可以将初始文本边界划分为两个对称控制点，并使用轻量级边界细化模块（BRM）迭代细化新的文本边界，以适应各种形状和比例。为了缓解文本检测和识别之间的不兼容性，我们提出了一种带有薄板样条的动态采样模块（DSM），它可以在检测到的文本区域中动态采样适当的特征进行识别。无需额外监督，DSM可以通过识别模块返回的梯度主动学习为文本识别采样适当的特征。在具有挑战性的场景文本和类似反向的场景文本数据集上进行的大量实验表明，我们的方法在不规则和类似反向的文本识别方面都取得了优异的性能。

相似文献

Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling.通过阅读顺序估计和动态采样实现类逆对抗场景文本识别

IEEE Trans Image Process. 2024;33:825-839. doi: 10.1109/TIP.2024.3352399. Epub 2024 Jan 19.

Boundary TextSpotter: Toward Arbitrary-Shaped Scene Text Spotting.边界文本检测：迈向任意形状场景文本检测

IEEE Trans Image Process. 2022;31:6200-6212. doi: 10.1109/TIP.2022.3206615. Epub 2022 Sep 28.

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.Mask TextSpotter：一种端到端可训练的神经网络，用于识别任意形状的文本。

IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):532-548. doi: 10.1109/TPAMI.2019.2937086. Epub 2021 Jan 11.

ASTS: A Unified Framework for Arbitrary Shape Text Spotting.ASTS：一种用于任意形状文本检测的统一框架。

IEEE Trans Image Process. 2020;29:5924-5936. doi: 10.1109/TIP.2020.2984082. Epub 2020 Apr 30.

Towards End-to-End Text Spotting in Natural Scenes.面向自然场景的端到端文本检测。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7266-7281. doi: 10.1109/TPAMI.2021.3095916. Epub 2022 Sep 14.

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.ABINet++：面向场景文本定位的自主、双向和迭代语言建模。

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7123-7141. doi: 10.1109/TPAMI.2022.3223908. Epub 2023 May 5.

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.斯隆：用于场景文本识别的尺度自适应方向注意网络。

IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.

A Robot Object Recognition Method Based on Scene Text Reading in Home Environments.基于家庭环境中场景文本阅读的机器人目标识别方法。

Sensors (Basel). 2021 Mar 9;21(5):1919. doi: 10.3390/s21051919.

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本：用于自然场景图像中乌尔都语文本端到端识别的综合数据集。

Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.

SPTS v2: Single-Point Scene Text Spotting.SPTS v2：单点场景文本识别

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15665-15679. doi: 10.1109/TPAMI.2023.3312285. Epub 2023 Nov 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过阅读顺序估计和动态采样实现类逆对抗场景文本识别

Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献