• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ASTER:具有灵活矫正功能的注意场景文本识别器。

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Sep;41(9):2035-2048. doi: 10.1109/TPAMI.2018.2848939. Epub 2018 Jun 25.

DOI:10.1109/TPAMI.2018.2848939
PMID:29994467
Abstract

A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.

摘要

场景文本识别的一个挑战是处理具有扭曲或不规则布局的文本。特别是,透视文本和弯曲文本在自然场景中很常见,很难识别。在这项工作中,我们引入了 ASTER,这是一个端到端的神经网络模型,它由一个矫正网络和一个识别网络组成。矫正网络自适应地将输入图像转换为新的图像,对其中的文本进行矫正。它由一个灵活的薄板样条变换驱动,该变换可以处理各种文本不规则性,并且在没有人工注释的情况下进行训练。识别网络是一个注意力序列到序列模型,它直接从矫正后的图像中预测字符序列。整个模型是端到端训练的,只需要图像及其地面真实文本。通过广泛的实验,我们验证了矫正的有效性,并展示了 ASTER 的最先进的识别性能。此外,我们还证明了 ASTER 是端到端识别系统中的一个强大组件,因为它能够增强检测器。

相似文献

1
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.ASTER:具有灵活矫正功能的注意场景文本识别器。
IEEE Trans Pattern Anal Mach Intell. 2019 Sep;41(9):2035-2048. doi: 10.1109/TPAMI.2018.2848939. Epub 2018 Jun 25.
2
SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.斯隆:用于场景文本识别的尺度自适应方向注意网络。
IEEE Trans Image Process. 2021;30:1687-1701. doi: 10.1109/TIP.2020.3045602. Epub 2021 Jan 14.
3
PETR: Rethinking the Capability of Transformer-Based Language Model in Scene Text Recognition.PETR:重新思考基于转换器的语言模型在场景文本识别中的能力。
IEEE Trans Image Process. 2022;31:5585-5598. doi: 10.1109/TIP.2022.3197981. Epub 2022 Aug 30.
4
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes.Mask TextSpotter:一种端到端可训练的神经网络,用于识别任意形状的文本。
IEEE Trans Pattern Anal Mach Intell. 2021 Feb;43(2):532-548. doi: 10.1109/TPAMI.2019.2937086. Epub 2021 Jan 11.
5
Towards End-to-End Text Spotting in Natural Scenes.面向自然场景的端到端文本检测。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7266-7281. doi: 10.1109/TPAMI.2021.3095916. Epub 2022 Sep 14.
6
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.
7
Deep end-to-end rolling shutter rectification.深度端到端卷帘快门校正。
J Opt Soc Am A Opt Image Sci Vis. 2020 Oct 1;37(10):1574-1582. doi: 10.1364/JOSAA.388818.
8
TextBoxes++: A Single-Shot Oriented Scene Text Detector.TextBoxes++:一种单阶段的面向场景的文本检测器。
IEEE Trans Image Process. 2018 Aug;27(8):3676-3690. doi: 10.1109/TIP.2018.2825107. Epub 2018 Apr 9.
9
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本:用于自然场景图像中乌尔都语文本端到端识别的综合数据集。
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
10
MA-CharNet: Multi-angle fusion character recognition network.MA-CharNet:多角度融合字符识别网络。
PLoS One. 2022 Aug 29;17(8):e0272601. doi: 10.1371/journal.pone.0272601. eCollection 2022.

引用本文的文献

1
Single-Character-Based Embedding Feature Aggregation Using Cross-Attention for Scene Text Super-Resolution.基于单字符嵌入特征聚合的交叉注意力场景文本超分辨率方法
Sensors (Basel). 2025 Apr 2;25(7):2228. doi: 10.3390/s25072228.
2
Human-AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era.用于远程视力辅助的人机协作:大语言模型时代的视角
Future Internet. 2024 Jul;16(7). doi: 10.3390/fi16070254. Epub 2024 Jul 18.
3
Opportunities for Human-AI Collaboration in Remote Sighted Assistance.远程视力辅助中人类与人工智能协作的机遇。
IUI. 2022 Mar;2022:63-78. doi: 10.1145/3490099.3511113. Epub 2022 Mar 22.
4
Text Font Correction and Alignment Method for Scene Text Recognition.用于场景文本识别的文本字体校正与对齐方法
Sensors (Basel). 2024 Dec 11;24(24):7917. doi: 10.3390/s24247917.
5
Visual place recognition from end-to-end semantic scene text features.基于端到端语义场景文本特征的视觉场所识别
Front Robot AI. 2024 Sep 16;11:1424883. doi: 10.3389/frobt.2024.1424883. eCollection 2024.
6
A real-time arbitrary-shape text detector.实时任意形状文本检测器。
PLoS One. 2024 Apr 16;19(4):e0302234. doi: 10.1371/journal.pone.0302234. eCollection 2024.
7
ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition.ViTSTR-变换器:用于场景文本识别的无交叉注意力视觉变换器变换器
J Imaging. 2023 Dec 13;9(12):276. doi: 10.3390/jimaging9120276.
8
Cofea: correlation-based feature selection for single-cell chromatin accessibility data.Cofea:基于相关性的单细胞染色质可及性数据特征选择。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad458.
9
MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry.多任务学习在低分辨率场景文本识别中的应用研究——基于双重注意力机制及其在物流行业的应用
PLoS One. 2023 Dec 12;18(12):e0294943. doi: 10.1371/journal.pone.0294943. eCollection 2023.
10
Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition.基于可解释的连接主义时间分类的场景文本识别
J Imaging. 2023 Nov 15;9(11):248. doi: 10.3390/jimaging9110248.