Suppr超能文献

Mask TextSpotter:一种端到端可训练的神经网络,用于识别任意形状的文本。

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):532-548. doi: 10.1109/TPAMI.2019.2937086. Epub 2021 Jan 11.

Abstract

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

摘要

将文本检测和文本识别统一在端到端的训练框架中已经成为阅读野外文本的新趋势,因为这两个任务高度相关且互补。在本文中,我们研究了场景文本定位问题,旨在同时在自然图像中进行文本检测和识别。我们提出了一种名为 Mask TextSpotter 的端到端可训练神经网络。与之前的文本定位器不同,Mask TextSpotter 遵循由提案生成网络和序列到序列识别网络组成的流水线,它具有简单流畅的端到端学习过程,其中检测和识别都可以直接通过语义分割从二维空间中实现。此外,我们还提出了一种空间注意力模块,以提高性能和通用性。受益于我们在检测和识别方面提出的二维表示,它可以轻松处理不规则形状的文本实例,例如弯曲的文本。我们在四个英语数据集和一个多语言数据集上进行了评估,在检测和端到端文本识别任务中都始终优于最先进的方法。此外,我们还进一步研究了我们方法的识别模块,它在规则和不规则文本数据集上的场景文本识别性能明显优于最先进的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验