Suppr超能文献

用于场景文本识别的注意力引导特征编码

Attention Guided Feature Encoding for Scene Text Recognition.

作者信息

Hassan Ehtesham, V L Lekshmi

机构信息

Department of Computer Science and Engineering, Kuwait College of Science and Technology, Doha District, Block 4, Kuwait City 35004, Kuwait.

出版信息

J Imaging. 2022 Oct 8;8(10):276. doi: 10.3390/jimaging8100276.

Abstract

The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder-decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods.

摘要

现实场景图像在文本外观上呈现出一系列变化,包括复杂的形状、大小变化和奇特的字体属性。因此,从场景图像中进行文本识别仍然是计算机视觉研究中的一个具有挑战性的问题。我们通过设计一种新颖的特征增强卷积循环神经网络架构,提出了一种场景文本识别方法。我们的工作涉及场景文本识别以及序列到序列建模,其中提出了一种新颖的深度编码器 - 解码器网络。所提出网络中的编码器围绕具有空间注意力块的卷积块层次结构进行设计,随后是双向长短期记忆层。与现有场景文本识别方法不同,现有方法在整个架构的解码器端引入时间注意力,而我们的卷积架构引入了新颖的空间注意力设计,以引导对场景文本图像中的文本细节进行特征提取。实验和分析表明,我们的方法为输入图像学习到了强大的特定于文本的特征序列,因为为特征提取设计的卷积架构经过调整以捕获更广泛的空间文本上下文。通过在ICDAR2013、ICDAR2015、IIIT5K和SVT数据集上进行广泛实验,本文展示了相对于许多重要的现有最先进方法的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ede8/9604773/3c0b2b8fc442/jimaging-08-00276-g001.jpg

相似文献

1
Attention Guided Feature Encoding for Scene Text Recognition.
J Imaging. 2022 Oct 8;8(10):276. doi: 10.3390/jimaging8100276.
2
SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.
IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.
3
Text Recognition Model Based on Multi-Scale Fusion CRNN.
Sensors (Basel). 2023 Aug 8;23(16):7034. doi: 10.3390/s23167034.
4
Scene Text Recognition Based on Bidirectional LSTM and Deep Neural Network.
Comput Intell Neurosci. 2021 Nov 23;2021:2676780. doi: 10.1155/2021/2676780. eCollection 2021.
5
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.
6
Irregular Scene Text Detection Based on a Graph Convolutional Network.
Sensors (Basel). 2023 Jan 17;23(3):1070. doi: 10.3390/s23031070.
7
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
8
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7123-7141. doi: 10.1109/TPAMI.2022.3223908. Epub 2023 May 5.
9
Sequential vessel segmentation via deep channel attention network.
Neural Netw. 2020 Aug;128:172-187. doi: 10.1016/j.neunet.2020.05.005. Epub 2020 May 13.

本文引用的文献

1
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.
2
Scene text recognition in mobile applications by character descriptor and structure configuration.
IEEE Trans Image Process. 2014 Jul;23(7):2972-82. doi: 10.1109/TIP.2014.2317980.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验