• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DPNet:基于双视角 CNN-Transformer 的场景文本检测。

DPNet: Scene text detection based on dual perspective CNN-transformer.

机构信息

School of Physics and Electronic-Electrical Engineering, ABA Teachers University, Wenchuan, Aba Tibetan and Qiang Autonomous Prefecture, Sichuan, China.

出版信息

PLoS One. 2024 Oct 21;19(10):e0309286. doi: 10.1371/journal.pone.0309286. eCollection 2024.

DOI:10.1371/journal.pone.0309286
PMID:39432472
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11493292/
Abstract

With the continuous advancement of deep learning, research in scene text detection has evolved significantly. However, complex backgrounds and various text forms complicate the task of detecting text from images. CNN is a deep learning algorithm that automatically extracts features through convolution operation. In the task of scene text detection, it can capture local text features in images, but it lacks global attributes. In recent years, inspired by the application of transformers in the field of computer vision, it can capture the global information of images and describe them intuitively. Therefore, this paper proposes scene text detection based on dual perspective CNN-transformer. The channel enhanced self-attention module (CESAM) and spatial enhanced self-attention module (SESAM) proposed in this paper are integrated into the traditional ResNet backbone network. This integration effectively facilitates the learning of global contextual information and positional relationships of text, thereby alleviating the challenge of detecting small target text. Furthermore, this paper introduces a feature decoder designed to refine the effective text information within the feature map and enhance the perception of detailed information. Experiments show that the method proposed in this paper significantly improves the robustness of the model for different types of text detection. Compared to the baseline, it achieves performance improvements of 2.51% (83.81 vs. 81.3) on the Total-Text dataset, 1.87% (86.07 vs. 84.2) on the ICDAR 2015 dataset, and 3.63% (86.72 vs. 83.09) on the MSRA-TD500 dataset, while also demonstrating better visual effects.

摘要

随着深度学习的不断发展,场景文本检测的研究取得了显著的进展。然而,复杂的背景和各种形式的文本使得从图像中检测文本的任务变得复杂。CNN 是一种深度学习算法,通过卷积运算自动提取特征。在场景文本检测任务中,它可以捕获图像中的局部文本特征,但缺乏全局属性。近年来,受转换器在计算机视觉领域应用的启发,它可以捕获图像的全局信息并直观地描述它们。因此,本文提出了基于双视角 CNN-Transformer 的场景文本检测。本文提出的通道增强自注意力模块(CESAM)和空间增强自注意力模块(SESAM)集成到传统的 ResNet 骨干网络中。这种集成有效地促进了对文本全局上下文信息和位置关系的学习,从而缓解了检测小目标文本的挑战。此外,本文引入了一个特征解码器,旨在细化特征图内的有效文本信息,并增强对详细信息的感知。实验表明,本文提出的方法显著提高了模型对不同类型文本检测的鲁棒性。与基线相比,在 Total-Text 数据集上的性能提高了 2.51%(83.81 对 81.3),在 ICDAR 2015 数据集上提高了 1.87%(86.07 对 84.2),在 MSRA-TD500 数据集上提高了 3.63%(86.72 对 83.09),同时还展现出更好的视觉效果。

相似文献

1
DPNet: Scene text detection based on dual perspective CNN-transformer.DPNet:基于双视角 CNN-Transformer 的场景文本检测。
PLoS One. 2024 Oct 21;19(10):e0309286. doi: 10.1371/journal.pone.0309286. eCollection 2024.
2
Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。
Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.
3
Scene Text Detection Based on Two-Branch Feature Extraction.基于双分支特征提取的场景文本检测。
Sensors (Basel). 2022 Aug 20;22(16):6262. doi: 10.3390/s22166262.
4
A coordinated adaptive multiscale enhanced spatio-temporal fusion network for multi-lead electrocardiogram arrhythmia detection.一种协调自适应多尺度增强时空融合网络,用于多导联心电图心律失常检测。
Sci Rep. 2024 Sep 6;14(1):20828. doi: 10.1038/s41598-024-71700-z.
5
Res2Net-based multi-scale and multi-attention model for traffic scene image classification.基于 Res2Net 的交通场景图像分类的多尺度和多注意力模型。
PLoS One. 2024 May 20;19(5):e0300017. doi: 10.1371/journal.pone.0300017. eCollection 2024.
6
A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.一种基于注意力特征提取和级联特征融合的多尺度自然场景文本检测方法
Sensors (Basel). 2024 Jun 9;24(12):3758. doi: 10.3390/s24123758.
7
TGDAUNet: Transformer and GCNN based dual-branch attention UNet for medical image segmentation.TGDAUNet:基于 Transformer 和 GCNN 的双分支注意力 U-Net 用于医学图像分割。
Comput Biol Med. 2023 Dec;167:107583. doi: 10.1016/j.compbiomed.2023.107583. Epub 2023 Oct 21.
8
FDB-Net: Fusion double branch network combining CNN and transformer for medical image segmentation.FDB-Net:融合 CNN 和 Transformer 的双分支网络用于医学图像分割。
J Xray Sci Technol. 2024;32(4):931-951. doi: 10.3233/XST-230413.
9
Transformer guided self-adaptive network for multi-scale skin lesion image segmentation.Transformer 引导的自适网络用于多尺度皮肤病变图像分割。
Comput Biol Med. 2024 Feb;169:107846. doi: 10.1016/j.compbiomed.2023.107846. Epub 2023 Dec 23.
10
ScribFormer: Transformer Makes CNN Work Better for Scribble-Based Medical Image Segmentation.ScribFormer:Transformer 使 CNN 更适用于基于草图的医学图像分割。
IEEE Trans Med Imaging. 2024 Jun;43(6):2254-2265. doi: 10.1109/TMI.2024.3363190. Epub 2024 Jun 3.

本文引用的文献

1
Fuzzy Semantics for Arbitrary-Shaped Scene Text Detection.用于任意形状场景文本检测的模糊语义
IEEE Trans Image Process. 2023;32:1-12. doi: 10.1109/TIP.2022.3201467. Epub 2022 Dec 19.
2
Kernel Proposal Network for Arbitrary Shape Text Detection.用于任意形状文本检测的内核提议网络。
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8731-8742. doi: 10.1109/TNNLS.2022.3152596. Epub 2023 Oct 27.
3
Arbitrarily Shaped Scene Text Detection with a Mask Tightness Text Detector.使用掩码紧密度文本检测器的任意形状场景文本检测
IEEE Trans Image Process. 2019 Nov 26. doi: 10.1109/TIP.2019.2954218.
4
TextField: Learning a Deep Direction Field for Irregular Scene Text Detection.文本字段:学习用于不规则场景文本检测的深度方向场。
IEEE Trans Image Process. 2019 Nov;28(11):5566-5579. doi: 10.1109/TIP.2019.2900589. Epub 2019 Feb 21.