• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于场景文本识别的文本字体校正与对齐方法

Text Font Correction and Alignment Method for Scene Text Recognition.

作者信息

Ding Liuxu, Liu Yuefeng, Zhao Qiyan, Liu Yunong

机构信息

School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China.

出版信息

Sensors (Basel). 2024 Dec 11;24(24):7917. doi: 10.3390/s24247917.

DOI:10.3390/s24247917
PMID:39771658
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11679380/
Abstract

Text recognition is a rapidly evolving task with broad practical applications across multiple industries. However, due to the arbitrary-shape text arrangement, irregular text font, and unintended occlusion of font, this remains a challenging task. To handle images with arbitrary-shape text arrangement and irregular text font, we designed the Discriminative Standard Text Font (DSTF) and the Feature Alignment and Complementary Fusion (FACF). To address the unintended occlusion of font, we propose a Dual Attention Serial Module (DASM), which is integrated between residual modules to enhance the focus on text texture. These components improve text recognition by correcting irregular text and aligning it with the original feature extraction, thus complementing the overall recognition process. Additionally, to enhance the study of text recognition in natural scenes, we developed the VBC Chinese dataset under varying lighting conditions, including strong light, weak light, darkness, and other natural environments. Experimental results show that our method achieves competitive performance on the VBC dataset with an accuracy of 90.8% and an overall average accuracy of 93.8%.

摘要

文本识别是一项快速发展的任务,在多个行业都有广泛的实际应用。然而,由于文本排列形状任意、字体不规则以及字体的意外遮挡,这仍然是一项具有挑战性的任务。为了处理文本排列形状任意和字体不规则的图像,我们设计了判别标准文本字体(DSTF)和特征对齐与互补融合(FACF)。为了解决字体的意外遮挡问题,我们提出了一种双注意力串行模块(DASM),它集成在残差模块之间,以增强对文本纹理的关注。这些组件通过校正不规则文本并将其与原始特征提取对齐来改进文本识别,从而补充整个识别过程。此外,为了加强对自然场景中文本识别的研究,我们在不同光照条件下开发了VBC中文数据集,包括强光、弱光、黑暗和其他自然环境。实验结果表明,我们的方法在VBC数据集上取得了具有竞争力的性能,准确率为90.8%,总体平均准确率为93.8%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/574d668d5d5a/sensors-24-07917-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/8d456fceb1a4/sensors-24-07917-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/eab03d52cf82/sensors-24-07917-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/7d65429f85cf/sensors-24-07917-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/741121d0812b/sensors-24-07917-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/f61ccb3cacb1/sensors-24-07917-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/0b2d8c29e6f6/sensors-24-07917-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/d5988d4df862/sensors-24-07917-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/46eaf3099062/sensors-24-07917-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/2bebc53d9c81/sensors-24-07917-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/f3c9c9e7d430/sensors-24-07917-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/3b5eb129f852/sensors-24-07917-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/83b8b9b86bff/sensors-24-07917-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/e750940a933f/sensors-24-07917-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/cb7e2558028e/sensors-24-07917-g014a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/574d668d5d5a/sensors-24-07917-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/8d456fceb1a4/sensors-24-07917-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/eab03d52cf82/sensors-24-07917-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/7d65429f85cf/sensors-24-07917-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/741121d0812b/sensors-24-07917-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/f61ccb3cacb1/sensors-24-07917-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/0b2d8c29e6f6/sensors-24-07917-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/d5988d4df862/sensors-24-07917-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/46eaf3099062/sensors-24-07917-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/2bebc53d9c81/sensors-24-07917-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/f3c9c9e7d430/sensors-24-07917-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/3b5eb129f852/sensors-24-07917-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/83b8b9b86bff/sensors-24-07917-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/e750940a933f/sensors-24-07917-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/cb7e2558028e/sensors-24-07917-g014a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b05/11679380/574d668d5d5a/sensors-24-07917-g015.jpg

相似文献

1
Text Font Correction and Alignment Method for Scene Text Recognition.用于场景文本识别的文本字体校正与对齐方法
Sensors (Basel). 2024 Dec 11;24(24):7917. doi: 10.3390/s24247917.
2
Scene Uyghur Text Detection Based on Fine-Grained Feature Representation.基于细粒度特征表示的维吾尔语场景文本检测。
Sensors (Basel). 2022 Jun 9;22(12):4372. doi: 10.3390/s22124372.
3
Attention Guided Feature Encoding for Scene Text Recognition.用于场景文本识别的注意力引导特征编码
J Imaging. 2022 Oct 8;8(10):276. doi: 10.3390/jimaging8100276.
4
An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors.基于文本位置校正和编解码器网络的视觉传感器场景图像文本识别算法。
Sensors (Basel). 2020 May 22;20(10):2942. doi: 10.3390/s20102942.
5
MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry.多任务学习在低分辨率场景文本识别中的应用研究——基于双重注意力机制及其在物流行业的应用
PLoS One. 2023 Dec 12;18(12):e0294943. doi: 10.1371/journal.pone.0294943. eCollection 2023.
6
Towards End-to-End Text Spotting in Natural Scenes.面向自然场景的端到端文本检测。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7266-7281. doi: 10.1109/TPAMI.2021.3095916. Epub 2022 Sep 14.
7
Arbitrary Font Generation by Encoder Learning of Disentangled Features.通过解缠特征的编码器学习生成任意字体。
Sensors (Basel). 2022 Mar 19;22(6):2374. doi: 10.3390/s22062374.
8
Scene Text Detection Based on Two-Branch Feature Extraction.基于双分支特征提取的场景文本检测。
Sensors (Basel). 2022 Aug 20;22(16):6262. doi: 10.3390/s22166262.
9
A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.一种基于注意力特征提取和级联特征融合的多尺度自然场景文本检测方法
Sensors (Basel). 2024 Jun 9;24(12):3758. doi: 10.3390/s24123758.
10
A Robot Object Recognition Method Based on Scene Text Reading in Home Environments.基于家庭环境中场景文本阅读的机器人目标识别方法。
Sensors (Basel). 2021 Mar 9;21(5):1919. doi: 10.3390/s21051919.

本文引用的文献

1
HubNet: An E2E Model for Wheel Hub Text Detection and Recognition Using Global and Local Features.HubNet:一种利用全局和局部特征进行轮毂文本检测与识别的端到端模型。
Sensors (Basel). 2024 Sep 24;24(19):6183. doi: 10.3390/s24196183.
2
A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization.一种用于完全离线手写文本规范化的Pix2Pix架构。
Sensors (Basel). 2024 Jun 16;24(12):3892. doi: 10.3390/s24123892.
3
Text Recognition Model Based on Multi-Scale Fusion CRNN.基于多尺度融合CRNN的文本识别模型
Sensors (Basel). 2023 Aug 8;23(16):7034. doi: 10.3390/s23167034.
4
Lightweight Scene Text Recognition Based on Transformer.基于 Transformer 的轻量级场景文本识别。
Sensors (Basel). 2023 May 5;23(9):4490. doi: 10.3390/s23094490.
5
Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition.用于精确场景文本识别的图像到字符再到单词的变换器
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12908-12921. doi: 10.1109/TPAMI.2022.3230962. Epub 2023 Oct 3.
6
ABCNet v2: Adaptive Bezier-Curve Network for Real-Time End-to-End Text Spotting.ABCNet v2:用于实时端到端文本定位的自适应贝塞尔曲线网络。
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):8048-8064. doi: 10.1109/TPAMI.2021.3107437. Epub 2022 Oct 4.
7
PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.PAN++:面向任意形状文本的高效准确端到端定位。
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5349-5367. doi: 10.1109/TPAMI.2021.3077555. Epub 2022 Aug 4.
8
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.ASTER:具有灵活矫正功能的注意场景文本识别器。
IEEE Trans Pattern Anal Mach Intell. 2019 Sep;41(9):2035-2048. doi: 10.1109/TPAMI.2018.2848939. Epub 2018 Jun 25.
9
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.