• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过从分解的现实世界数据中学习实现的场景文本合成引擎。

A Scene-Text Synthesis Engine Achieved Through Learning From Decomposed Real-World Data.

作者信息

Tang Zhengmi, Miyazaki Tomo, Omachi Shinichiro

出版信息

IEEE Trans Image Process. 2023;32:5837-5851. doi: 10.1109/TIP.2023.3326685. Epub 2023 Nov 1.

DOI:10.1109/TIP.2023.3326685
PMID:37889809
Abstract

Scene-text image synthesis techniques that aim to naturally compose text instances on background scene images are very appealing for training deep neural networks due to their ability to provide accurate and comprehensive annotation information. Prior studies have explored generating synthetic text images on two-dimensional and three-dimensional surfaces using rules derived from real-world observations. Some of these studies have proposed generating scene-text images through learning; however, owing to the absence of a suitable training dataset, unsupervised frameworks have been explored to learn from existing real-world data, which might not yield reliable performance. To ease this dilemma and facilitate research on learning-based scene text synthesis, we introduce DecompST, a real-world dataset prepared from some public benchmarks, containing three types of annotations: quadrilateral-level BBoxes, stroke-level text masks, and text-erased images. Leveraging the DecompST dataset, we propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet). TLPNet first predicts the suitable regions for text embedding, after which TAANet adaptively adjusts the geometry and color of the text instance to match the background context. After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks. Comprehensive experiments were conducted to validate the effectiveness of the proposed LBTS along with existing methods, and the experimental results indicate the proposed LBTS can generate better pretraining data for scene text detectors. Our dataset and code are made available at: https://github.com/iiclab/DecompST.

摘要

旨在在背景场景图像上自然合成文本实例的场景文本图像合成技术,因其能够提供准确而全面的标注信息,对于训练深度神经网络非常有吸引力。先前的研究已经探索了使用从现实世界观察中得出的规则在二维和三维表面上生成合成文本图像。其中一些研究提出通过学习来生成场景文本图像;然而,由于缺乏合适的训练数据集,人们探索了无监督框架以从现有的现实世界数据中学习,而这可能无法产生可靠的性能。为了缓解这一困境并促进基于学习的场景文本合成研究,我们引入了DecompST,这是一个从一些公共基准准备的现实世界数据集,包含三种类型的标注:四边形级别的边界框、笔画级别的文本掩码和文本擦除图像。利用DecompST数据集,我们提出了一种基于学习的文本合成引擎(LBTS),它包括一个文本位置提议网络(TLPNet)和一个文本外观适配网络(TAANet)。TLPNet首先预测适合文本嵌入的区域,之后TAANet自适应地调整文本实例的几何形状和颜色以匹配背景上下文。训练后,这些网络可以集成并用于生成用于场景文本分析任务的合成数据集。我们进行了全面的实验来验证所提出的LBTS以及现有方法的有效性,实验结果表明所提出的LBTS可以为场景文本检测器生成更好的预训练数据。我们的数据集和代码可在以下网址获取:https://github.com/iiclab/DecompST 。

相似文献

1
A Scene-Text Synthesis Engine Achieved Through Learning From Decomposed Real-World Data.一种通过从分解的现实世界数据中学习实现的场景文本合成引擎。
IEEE Trans Image Process. 2023;32:5837-5851. doi: 10.1109/TIP.2023.3326685. Epub 2023 Nov 1.
2
Stroke-Based Scene Text Erasing Using Synthetic Data for Training.基于笔画的场景文本擦除:使用合成数据进行训练
IEEE Trans Image Process. 2021;30:9306-9320. doi: 10.1109/TIP.2021.3125260. Epub 2021 Nov 12.
3
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本:用于自然场景图像中乌尔都语文本端到端识别的综合数据集。
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
4
EraseNet: End-to-End Text Removal in the Wild.擦除网络:野外端到端文本擦除
IEEE Trans Image Process. 2020 Aug 28;PP. doi: 10.1109/TIP.2020.3018859.
5
Real-World Image Denoising with Deep Boosting.基于深度增强的真实世界图像去噪
IEEE Trans Pattern Anal Mach Intell. 2020 Dec;42(12):3071-3087. doi: 10.1109/TPAMI.2019.2921548. Epub 2020 Nov 3.
6
Txt2Img-MHN: Remote Sensing Image Generation From Text Using Modern Hopfield Networks.Txt2Img-MHN:使用现代霍普菲尔德网络从文本生成遥感图像。
IEEE Trans Image Process. 2023;32:5737-5750. doi: 10.1109/TIP.2023.3323799. Epub 2023 Oct 24.
7
Kernel Proposal Network for Arbitrary Shape Text Detection.用于任意形状文本检测的内核提议网络。
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8731-8742. doi: 10.1109/TNNLS.2022.3152596. Epub 2023 Oct 27.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation.通过组合场景操纵实现点云上的综合视觉问答
IEEE Trans Vis Comput Graph. 2024 Dec;30(12):7473-7485. doi: 10.1109/TVCG.2023.3340679. Epub 2024 Oct 28.
10
SIR: Self-Supervised Image Rectification via Seeing the Same Scene From Multiple Different Lenses.先生:通过从多个不同视角观察同一场景进行自监督图像校正。
IEEE Trans Image Process. 2023;32:865-877. doi: 10.1109/TIP.2022.3231087. Epub 2023 Jan 23.