Mogo Auto Intelligence and Telematics Information Technology Co., Ltd, Beijing, China.
Sci Rep. 2021 Nov 26;11(1):22978. doi: 10.1038/s41598-021-01520-y.
Sequence recognition of natural scene images has always been an important research topic in the field of computer vision. CRNN has been proven to be a popular end-to-end character sequence recognition network. However, the problem of wide characters is not considered under the setting of CRNN. The CRNN is less effective in recognizing long dense small characters. Aiming at the shortcomings of CRNN, we proposed an improved CRNN network, named CRNN-RES, based on BiLSTM and multiple receptive fields. Specifically, on the one hand, the CRNN-RES uses a dual pooling core to enhance the CNN network's ability to extract features. On the other hand, by improving the last RNN layer, the BiLSTM is changed to a shared parameter BiLSTM network using recursive residuals, which reduces the number of network parameters and improves the accuracy. In addition, we designed a structure that can flexibly configure the length of the input data sequence in the RNN layer, called the CRFC layer. Comparing the CRNN-RES network proposed in this paper with the original CRNN network, the extensive experiments show that when recognizing English characters and numbers, the parameters of CRNN-RES is 8197549, which decreased 133,752 parameters compare with CRNN. In the public dataset ICDAR 2003 (IC03), ICDAR 2013 (IC13), IIIT 5k-word (IIIT5k), and Street View Text (SVT), the CRNN-RES obtain the accuracy of 96.90%, 89.85%, 83.63%, and 82.96%, which higher than CRNN by 1.40%, 3.15%, 5.43%, and 2.16% respectively.
自然场景图像的序列识别一直是计算机视觉领域的一个重要研究课题。CRNN 已被证明是一种流行的端到端字符序列识别网络。然而,在 CRNN 的设置下,没有考虑宽字符的问题。CRNN 在识别长密集小字符方面效果较差。针对 CRNN 的缺点,我们提出了一种基于 BiLSTM 和多个感受野的改进 CRNN 网络,命名为 CRNN-RES。具体来说,一方面,CRNN-RES 使用双池核来增强 CNN 网络的特征提取能力。另一方面,通过改进最后一个 RNN 层,将 BiLSTM 改为使用递归残差的共享参数 BiLSTM 网络,减少了网络参数数量,提高了准确性。此外,我们设计了一种在 RNN 层中可以灵活配置输入数据序列长度的结构,称为 CRFC 层。通过将本文提出的 CRNN-RES 网络与原始 CRNN 网络进行比较,广泛的实验表明,在识别英文字符和数字时,CRNN-RES 的参数为 8197549,比 CRNN 减少了 133752 个参数。在公共数据集 ICDAR 2003(IC03)、ICDAR 2013(IC13)、IIIT 5k-word(IIIT5k)和 Street View Text(SVT)上,CRNN-RES 的准确率分别为 96.90%、89.85%、83.63%和 82.96%,比 CRNN 分别提高了 1.40%、3.15%、5.43%和 2.16%。