• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于视觉预测增强的场景维吾尔语识别

Scene Uyghur Recognition Based on Visual Prediction Enhancement.

作者信息

Liu Yaqi, Kong Fanjie, Xu Miaomiao, Silamu Wushour, Li Yanbing

机构信息

College of Information Science and Engineering, Xinjang University, No. 777 Huarui Street, Urumqi 830017, China.

Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China.

出版信息

Sensors (Basel). 2023 Oct 20;23(20):8610. doi: 10.3390/s23208610.

DOI:10.3390/s23208610
PMID:37896702
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10610570/
Abstract

Aiming at the problems of Uyghur oblique deformation, character adhesion and character similarity in scene images, this paper proposes a scene Uyghur recognition model with enhanced visual prediction. First, the content-aware correction network TPS++ is used to perform feature-level correction for skewed text. Then, ABINet is used as the basic recognition network, and the U-Net structure in the vision model is improved to aggregate horizontal features, suppress multiple activation phenomena, better describe the spatial characteristics of character positions, and alleviate the problem of character adhesion. Finally, a visual masking semantic awareness (VMSA) module is added to guide the vision model to consider the language information in the visual space by masking the corresponding visual features on the attention map to obtain more accurate visual prediction. This module can not only alleviate the correction load of the language model, but also distinguish similar characters using the language information. The effectiveness of the improved method is verified by ablation experiments, and the model is compared with common scene text recognition methods and scene Uyghur recognition methods on the self-built scene Uyghur dataset.

摘要

针对场景图像中维吾尔语倾斜变形、字符粘连和字符相似性等问题,本文提出了一种具有增强视觉预测能力的场景维吾尔语识别模型。首先,使用内容感知校正网络TPS++对倾斜文本进行特征级校正。然后,将ABINet用作基本识别网络,对视觉模型中的U-Net结构进行改进,以聚合水平特征、抑制多重激活现象、更好地描述字符位置的空间特征,并缓解字符粘连问题。最后,添加视觉掩码语义感知(VMSA)模块,通过在注意力图上掩码相应的视觉特征来引导视觉模型考虑视觉空间中的语言信息,从而获得更准确的视觉预测。该模块不仅可以减轻语言模型的校正负担,还可以利用语言信息区分相似字符。通过消融实验验证了改进方法的有效性,并在自建的场景维吾尔语数据集上,将该模型与常见的场景文本识别方法和场景维吾尔语识别方法进行了比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/e35f91720bde/sensors-23-08610-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/146e0bd640c0/sensors-23-08610-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/72eb0bde96cc/sensors-23-08610-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/73bfd76a3cec/sensors-23-08610-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/7990b86288ce/sensors-23-08610-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/3e0915e5ef9e/sensors-23-08610-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/954f698c44ef/sensors-23-08610-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/ed965ee5669c/sensors-23-08610-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/790d233609f4/sensors-23-08610-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/2f88dd3fe3bc/sensors-23-08610-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/ed31612d564f/sensors-23-08610-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/e35f91720bde/sensors-23-08610-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/146e0bd640c0/sensors-23-08610-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/72eb0bde96cc/sensors-23-08610-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/73bfd76a3cec/sensors-23-08610-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/7990b86288ce/sensors-23-08610-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/3e0915e5ef9e/sensors-23-08610-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/954f698c44ef/sensors-23-08610-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/ed965ee5669c/sensors-23-08610-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/790d233609f4/sensors-23-08610-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/2f88dd3fe3bc/sensors-23-08610-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/ed31612d564f/sensors-23-08610-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f04/10610570/e35f91720bde/sensors-23-08610-g011.jpg

相似文献

1
Scene Uyghur Recognition Based on Visual Prediction Enhancement.基于视觉预测增强的场景维吾尔语识别
Sensors (Basel). 2023 Oct 20;23(20):8610. doi: 10.3390/s23208610.
2
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting.ABINet++:面向场景文本定位的自主、双向和迭代语言建模。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7123-7141. doi: 10.1109/TPAMI.2022.3223908. Epub 2023 May 5.
3
Scene Uyghur Text Detection Based on Fine-Grained Feature Representation.基于细粒度特征表示的维吾尔语场景文本检测。
Sensors (Basel). 2022 Jun 9;22(12):4372. doi: 10.3390/s22124372.
4
Display-Semantic Transformer for Scene Text Recognition.用于场景文本识别的显示语义变换器
Sensors (Basel). 2023 Sep 28;23(19):8159. doi: 10.3390/s23198159.
5
An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors.基于文本位置校正和编解码器网络的视觉传感器场景图像文本识别算法。
Sensors (Basel). 2020 May 22;20(10):2942. doi: 10.3390/s20102942.
6
SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.斯隆:用于场景文本识别的尺度自适应方向注意网络。
IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.
7
PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition.PETR:重新思考基于转换器的语言模型在场景文本识别中的能力。
IEEE Trans Image Process. 2022;31:5585-5598. doi: 10.1109/TIP.2022.3197981. Epub 2022 Aug 30.
8
Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition.用于精确场景文本识别的图像到字符再到单词的变换器
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12908-12921. doi: 10.1109/TPAMI.2022.3230962. Epub 2023 Oct 3.
9
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本:用于自然场景图像中乌尔都语文本端到端识别的综合数据集。
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
10
Attention-Based Scene Text Detection on Dual Feature Fusion.基于注意力的双特征融合场景文本检测。
Sensors (Basel). 2022 Nov 23;22(23):9072. doi: 10.3390/s22239072.

本文引用的文献

1
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.ASTER:具有灵活矫正功能的注意场景文本识别器。
IEEE Trans Pattern Anal Mach Intell. 2019 Sep;41(9):2035-2048. doi: 10.1109/TPAMI.2018.2848939. Epub 2018 Jun 25.
2
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.