Suppr
超能文献

基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.

DOI:10.1109/TPAMI.2016.2646371

Abstract

Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.

摘要

基于图像的序列识别一直是计算机视觉领域的一个长期研究课题。在本文中，我们研究了场景文本识别问题，这是基于图像的序列识别中最重要和最具挑战性的任务之一。我们提出了一种新的神经网络架构，它将特征提取、序列建模和转录集成到一个统一的框架中。与以前的场景文本识别系统相比，所提出的架构具有四个独特的性质：（1）它是端到端可训练的，而大多数现有的算法都是将组件分别训练和调优的。（2）它自然地处理任意长度的序列，不需要字符分割或水平尺度归一化。（3）它不受任何预定义词汇的限制，在无词典和基于词典的场景文本识别任务中都能取得显著的性能。（4）它生成一个有效但更小的模型，更适用于实际的应用场景。在标准基准上的实验，包括 IIIT-5K、街景文本和 ICDAR 数据集，表明了所提出的算法优于先前的艺术。此外，所提出的算法在基于图像的乐谱识别任务中表现良好，这显然验证了它的通用性。

相似文献

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.

IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.

IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):532-548. doi: 10.1109/TPAMI.2019.2937086. Epub 2021 Jan 11.

TextBoxes++: A Single-Shot Oriented Scene Text Detector.

IEEE Trans Image Process. 2018 Aug;27(8):3676-3690. doi: 10.1109/TIP.2018.2825107. Epub 2018 Apr 9.

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.

IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.

Attention Guided Feature Encoding for Scene Text Recognition.

J Imaging. 2022 Oct 8;8(10):276. doi: 10.3390/jimaging8100276.

Text Recognition Model Based on Multi-Scale Fusion CRNN.

Sensors (Basel). 2023 Aug 8;23(16):7034. doi: 10.3390/s23167034.

Real-Time Lexicon-Free Scene Text Localization and Recognition.

IEEE Trans Pattern Anal Mach Intell. 2016 Sep;38(9):1872-85. doi: 10.1109/TPAMI.2015.2496234. Epub 2015 Oct 30.

An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment.

Sensors (Basel). 2020 Mar 11;20(6):1556. doi: 10.3390/s20061556.

ASTS: A Unified Framework for Arbitrary Shape Text Spotting.

IEEE Trans Image Process. 2020;29:5924-5936. doi: 10.1109/TIP.2020.2984082. Epub 2020 Apr 30.

Towards End-to-End Text Spotting in Natural Scenes.

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7266-7281. doi: 10.1109/TPAMI.2021.3095916. Epub 2022 Sep 14.

引用本文的文献

Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network-Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder.

Sensors (Basel). 2025 Aug 19;25(16):5133. doi: 10.3390/s25165133.

Two key algorithms for intelligent inspection robots in electric bicycle charging sheds.

Sci Rep. 2025 May 5;15(1):15690. doi: 10.1038/s41598-025-99825-9.

Semisupervised adaptive learning models for IDH1 mutation status prediction.

PLoS One. 2025 May 5;20(5):e0321404. doi: 10.1371/journal.pone.0321404. eCollection 2025.

EDPNet (Efficient DB and PARSeq Network): A Robust Framework for Online Digital Meter Detection and Recognition Under Challenging Scenarios.

Sensors (Basel). 2025 Apr 20;25(8):2603. doi: 10.3390/s25082603.

Text-in-Image Enhanced Self-Supervised Alignment Model for Aspect-Based Multimodal Sentiment Analysis on Social Media.

Sensors (Basel). 2025 Apr 17;25(8):2553. doi: 10.3390/s25082553.

Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution.

Sensors (Basel). 2025 Apr 2;25(7):2228. doi: 10.3390/s25072228.

Linguistic-visual based multimodal Yi character recognition.

Sci Rep. 2025 Apr 7;15(1):11874. doi: 10.1038/s41598-025-96397-6.

Attention-based handwritten Chinese recognition for power grid maintenance documents.

Sci Prog. 2025 Jan-Mar;108(1):368504241309786. doi: 10.1177/00368504241309786. Epub 2025 Mar 27.

Brain tumor intelligent diagnosis based on Auto-Encoder and U-Net feature extraction.

PLoS One. 2025 Mar 24;20(3):e0315631. doi: 10.1371/journal.pone.0315631. eCollection 2025.

Vision-Based Localization in Urban Areas for Mobile Robots.

Sensors (Basel). 2025 Feb 14;25(4):1178. doi: 10.3390/s25041178.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译