Suppr超能文献

基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.

Abstract

Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

摘要

基于图像的序列识别一直是计算机视觉领域的一个长期研究课题。在本文中,我们研究了场景文本识别问题,这是基于图像的序列识别中最重要和最具挑战性的任务之一。我们提出了一种新的神经网络架构,它将特征提取、序列建模和转录集成到一个统一的框架中。与以前的场景文本识别系统相比,所提出的架构具有四个独特的性质:(1)它是端到端可训练的,而大多数现有的算法都是将组件分别训练和调优的。(2)它自然地处理任意长度的序列,不需要字符分割或水平尺度归一化。(3)它不受任何预定义词汇的限制,在无词典和基于词典的场景文本识别任务中都能取得显著的性能。(4)它生成一个有效但更小的模型,更适用于实际的应用场景。在标准基准上的实验,包括 IIIT-5K、街景文本和 ICDAR 数据集,表明了所提出的算法优于先前的艺术。此外,所提出的算法在基于图像的乐谱识别任务中表现良好,这显然验证了它的通用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验