Suppr超能文献

基于视觉预测增强的场景维吾尔语识别

Scene Uyghur Recognition Based on Visual Prediction Enhancement.

作者信息

Liu Yaqi, Kong Fanjie, Xu Miaomiao, Silamu Wushour, Li Yanbing

机构信息

College of Information Science and Engineering, Xinjang University, No. 777 Huarui Street, Urumqi 830017, China.

Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China.

出版信息

Sensors (Basel). 2023 Oct 20;23(20):8610. doi: 10.3390/s23208610.

Abstract

Aiming at the problems of Uyghur oblique deformation, character adhesion and character similarity in scene images, this paper proposes a scene Uyghur recognition model with enhanced visual prediction. First, the content-aware correction network TPS++ is used to perform feature-level correction for skewed text. Then, ABINet is used as the basic recognition network, and the U-Net structure in the vision model is improved to aggregate horizontal features, suppress multiple activation phenomena, better describe the spatial characteristics of character positions, and alleviate the problem of character adhesion. Finally, a visual masking semantic awareness (VMSA) module is added to guide the vision model to consider the language information in the visual space by masking the corresponding visual features on the attention map to obtain more accurate visual prediction. This module can not only alleviate the correction load of the language model, but also distinguish similar characters using the language information. The effectiveness of the improved method is verified by ablation experiments, and the model is compared with common scene text recognition methods and scene Uyghur recognition methods on the self-built scene Uyghur dataset.

摘要

针对场景图像中维吾尔语倾斜变形、字符粘连和字符相似性等问题,本文提出了一种具有增强视觉预测能力的场景维吾尔语识别模型。首先,使用内容感知校正网络TPS++对倾斜文本进行特征级校正。然后,将ABINet用作基本识别网络,对视觉模型中的U-Net结构进行改进,以聚合水平特征、抑制多重激活现象、更好地描述字符位置的空间特征,并缓解字符粘连问题。最后,添加视觉掩码语义感知(VMSA)模块,通过在注意力图上掩码相应的视觉特征来引导视觉模型考虑视觉空间中的语言信息,从而获得更准确的视觉预测。该模块不仅可以减轻语言模型的校正负担,还可以利用语言信息区分相似字符。通过消融实验验证了改进方法的有效性,并在自建的场景维吾尔语数据集上,将该模型与常见的场景文本识别方法和场景维吾尔语识别方法进行了比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/146e0bd640c0/sensors-23-08610-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验