Suppr超能文献

基于可微二值化和自适应尺度融合的实时场景文本检测

Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion.

作者信息

Liao Minghui, Zou Zhisheng, Wan Zhaoyi, Yao Cong, Bai Xiang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):919-931. doi: 10.1109/TPAMI.2022.3155612. Epub 2022 Dec 5.

Abstract

Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

摘要

近年来,基于分割的场景文本检测方法在场景文本检测领域引起了广泛关注,这是因为它们在检测任意形状和极端宽高比的文本实例方面具有优势,得益于像素级描述。然而,绝大多数现有的基于分割的方法受到其复杂的后处理算法以及分割模型的尺度鲁棒性的限制,其中后处理算法不仅与模型优化相分离,而且耗时,并且尺度鲁棒性通常通过直接融合多尺度特征图来增强。在本文中,我们提出了一种可微二值化(DB)模块,该模块将后处理过程中最重要的步骤之一——二值化过程集成到分割网络中。与所提出的DB模块一起进行优化,分割网络可以产生更准确的结果,通过简单的流程提高了文本检测的准确性。此外,还提出了一种高效的自适应尺度融合(ASF)模块,通过自适应地融合不同尺度的特征来提高尺度鲁棒性。通过将所提出的DB和ASF与分割网络相结合,我们提出的场景文本检测器在五个标准基准上,在检测精度和速度方面均持续取得了领先的成果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验