Suppr超能文献

基于注意力的双特征融合场景文本检测。

Attention-Based Scene Text Detection on Dual Feature Fusion.

机构信息

Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China.

出版信息

Sensors (Basel). 2022 Nov 23;22(23):9072. doi: 10.3390/s22239072.

Abstract

The segmentation-based scene text detection algorithm has advantages in scene text detection scenarios with arbitrary shape and extreme aspect ratio, depending on its pixel-level description and fine post-processing. However, the insufficient use of semantic and spatial information in the network limits the classification and positioning capabilities of the network. Existing scene text detection methods have the problem of losing important feature information in the process of extracting features from each network layer. To solve this problem, the Attention-based Dual Feature Fusion Model (ADFM) is proposed. The Bi-directional Feature Fusion Pyramid Module (BFM) first adds stronger semantic information to the higher-resolution feature maps through a top-down process and then reduces the aliasing effects generated by the previous process through a bottom-up process to enhance the representation of multi-scale text semantic information. Meanwhile, a position-sensitive Spatial Attention Module (SAM) is introduced in the intermediate process of two-stage feature fusion. It focuses on the one feature map with the highest resolution and strongest semantic features generated in the top-down process and weighs the spatial position weight by the relevance of text features, thus improving the sensitivity of the text detection network to text regions. The effectiveness of each module of ADFM was verified by ablation experiments and the model was compared with recent scene text detection methods on several publicly available datasets.

摘要

基于分割的场景文本检测算法具有任意形状和极端纵横比的场景文本检测场景的优势,这取决于其像素级描述和精细的后处理。然而,网络中语义和空间信息的利用不足限制了网络的分类和定位能力。现有的场景文本检测方法在从每个网络层提取特征的过程中存在丢失重要特征信息的问题。为了解决这个问题,提出了基于注意力的双特征融合模型(ADFM)。双向特征融合金字塔模块(BFM)首先通过自顶向下的过程将更强的语义信息添加到更高分辨率的特征图中,然后通过自底向上的过程减少前一个过程产生的混叠效应,从而增强多尺度文本语义信息的表示。同时,在两级特征融合的中间过程中引入了位置敏感的空间注意力模块(SAM)。它关注自顶向下过程中生成的具有最高分辨率和最强语义特征的一个特征图,并通过文本特征的相关性对空间位置权重进行加权,从而提高文本检测网络对文本区域的敏感性。通过消融实验验证了 ADFM 的每个模块的有效性,并将该模型与几个公开可用数据集上的最新场景文本检测方法进行了比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4dbd/9739706/e0e04e55dcdd/sensors-22-09072-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验