Zhang Shi-Xue, Yang Chun, Zhu Xiaobin, Zhou Hongyang, Wang Hongfa, Yin Xu-Cheng
IEEE Trans Image Process. 2024;33:825-839. doi: 10.1109/TIP.2024.3352399. Epub 2024 Jan 19.
Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss ( L ) consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.
场景文本识别是一项具有挑战性的任务,特别是对于类似反向的场景文本,其布局复杂,例如镜像、对称或反向弯曲。在本文中,我们提出了一个统一的端到端可训练的类似反向对抗文本识别框架,称为IATS,它可以有效地识别类似反向的场景文本,而不牺牲一般的文本。具体来说,我们提出了一种创新的阅读顺序估计模块(REM),它从初始边界模块(IBM)生成的初始文本边界中提取阅读顺序信息。为了优化和训练REM,我们提出了一种联合阅读顺序估计损失(L),它由分类损失、正交性损失和分布损失组成。借助IBM,我们可以将初始文本边界划分为两个对称控制点,并使用轻量级边界细化模块(BRM)迭代细化新的文本边界,以适应各种形状和比例。为了缓解文本检测和识别之间的不兼容性,我们提出了一种带有薄板样条的动态采样模块(DSM),它可以在检测到的文本区域中动态采样适当的特征进行识别。无需额外监督,DSM可以通过识别模块返回的梯度主动学习为文本识别采样适当的特征。在具有挑战性的场景文本和类似反向的场景文本数据集上进行的大量实验表明,我们的方法在不规则和类似反向的文本识别方面都取得了优异的性能。