Suppr超能文献

基于多尺度融合CRNN的文本识别模型

Text Recognition Model Based on Multi-Scale Fusion CRNN.

作者信息

Zou Le, He Zhihuang, Wang Kai, Wu Zhize, Wang Yifan, Zhang Guanhong, Wang Xiaofeng

机构信息

School of Artificial Intelligence and Big Data, Hefei University, Hefei 230601, China.

出版信息

Sensors (Basel). 2023 Aug 8;23(16):7034. doi: 10.3390/s23167034.

Abstract

Scene text recognition is a crucial area of research in computer vision. However, current mainstream scene text recognition models suffer from incomplete feature extraction due to the small downsampling scale used to extract features and obtain more features. This limitation hampers their ability to extract complete features of each character in the image, resulting in lower accuracy in the text recognition process. To address this issue, a novel text recognition model based on multi-scale fusion and the convolutional recurrent neural network (CRNN) has been proposed in this paper. The proposed model has a convolutional layer, a feature fusion layer, a recurrent layer, and a transcription layer. The convolutional layer uses two scales of feature extraction, which enables it to derive two distinct outputs for the input text image. The feature fusion layer fuses the different scales of features and forms a new feature. The recurrent layer learns contextual features from the input sequence of features. The transcription layer outputs the final result. The proposed model not only expands the recognition field but also learns more image features at different scales; thus, it extracts a more complete set of features and achieving better recognition of text. The results of experiments are then presented to demonstrate that the proposed model outperforms the CRNN model on text datasets, such as Street View Text, IIIT-5K, ICDAR2003, and ICDAR2013 scenes, in terms of text recognition accuracy.

摘要

场景文本识别是计算机视觉领域的一个关键研究方向。然而,由于用于提取特征和获取更多特征的下采样比例较小,当前主流的场景文本识别模型存在特征提取不完整的问题。这一限制阻碍了它们提取图像中每个字符完整特征的能力,导致文本识别过程中的准确率较低。为了解决这个问题,本文提出了一种基于多尺度融合和卷积循环神经网络(CRNN)的新型文本识别模型。所提出的模型有一个卷积层、一个特征融合层、一个循环层和一个转录层。卷积层使用两种尺度的特征提取,这使其能够为输入的文本图像得出两个不同的输出。特征融合层融合不同尺度的特征并形成一个新的特征。循环层从输入的特征序列中学习上下文特征。转录层输出最终结果。所提出的模型不仅扩大了识别范围,还能在不同尺度上学习更多的图像特征;因此,它能提取更完整的特征集并实现更好的文本识别。随后展示了实验结果,以证明所提出的模型在文本识别准确率方面优于CRNN模型,在诸如街景文本、IIIT-5K、ICDAR2003和ICDAR2013场景等文本数据集上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dde/10459494/76088418bc08/sensors-23-07034-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验