Ding Liuxu, Liu Yuefeng, Zhao Qiyan, Liu Yunong
School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China.
Sensors (Basel). 2024 Dec 11;24(24):7917. doi: 10.3390/s24247917.
Text recognition is a rapidly evolving task with broad practical applications across multiple industries. However, due to the arbitrary-shape text arrangement, irregular text font, and unintended occlusion of font, this remains a challenging task. To handle images with arbitrary-shape text arrangement and irregular text font, we designed the Discriminative Standard Text Font (DSTF) and the Feature Alignment and Complementary Fusion (FACF). To address the unintended occlusion of font, we propose a Dual Attention Serial Module (DASM), which is integrated between residual modules to enhance the focus on text texture. These components improve text recognition by correcting irregular text and aligning it with the original feature extraction, thus complementing the overall recognition process. Additionally, to enhance the study of text recognition in natural scenes, we developed the VBC Chinese dataset under varying lighting conditions, including strong light, weak light, darkness, and other natural environments. Experimental results show that our method achieves competitive performance on the VBC dataset with an accuracy of 90.8% and an overall average accuracy of 93.8%.
文本识别是一项快速发展的任务,在多个行业都有广泛的实际应用。然而,由于文本排列形状任意、字体不规则以及字体的意外遮挡,这仍然是一项具有挑战性的任务。为了处理文本排列形状任意和字体不规则的图像,我们设计了判别标准文本字体(DSTF)和特征对齐与互补融合(FACF)。为了解决字体的意外遮挡问题,我们提出了一种双注意力串行模块(DASM),它集成在残差模块之间,以增强对文本纹理的关注。这些组件通过校正不规则文本并将其与原始特征提取对齐来改进文本识别,从而补充整个识别过程。此外,为了加强对自然场景中文本识别的研究,我们在不同光照条件下开发了VBC中文数据集,包括强光、弱光、黑暗和其他自然环境。实验结果表明,我们的方法在VBC数据集上取得了具有竞争力的性能,准确率为90.8%,总体平均准确率为93.8%。