Suppr超能文献

(HTBNet)基于双曲正切二值化和交叉熵的任意形状场景文本检测

(HTBNet)Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross-Entropy.

作者信息

Chen Zhao

机构信息

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China.

出版信息

Entropy (Basel). 2024 Jun 29;26(7):560. doi: 10.3390/e26070560.

Abstract

The existing segmentation-based scene text detection methods mostly need complicated post-processing, and the post-processing operation is separated from the training process, which greatly reduces the detection performance. The previous method, DBNet, successfully simplified post-processing and integrated post-processing into a segmentation network. However, the training process of the model took a long time for 1200 epochs and the sensitivity to texts of various scales was lacking, leading to some text instances being missed. Considering the above two problems, we design the text detection Network with Binarization of Hyperbolic Tangent (HTBNet). First of all, we propose the Binarization of Hyperbolic Tangent (HTB), optimized along with which the segmentation network can expedite the initial convergent speed by reducing the number of epochs from 1200 to 600. Because features of different channels in the same scale feature map focus on the information of different regions in the image, to better represent the important features of all objects in the image, we devise the Multi-Scale Channel Attention (MSCA). Meanwhile, considering that multi-scale objects in the image cannot be simultaneously detected, we propose a novel module named Fused Module with Channel and Spatial (FMCS), which can fuse the multi-scale feature maps from channel and spatial dimensions. Finally, we adopt cross-entropy as the loss function, which measures the difference between predicted values and ground truths. The experimental results show that HTBNet, compared with lightweight models, has achieved competitive performance and speed on Total-Text (F-measure:86.0%, FPS:30) and MSRA-TD500 (F-measure:87.5%, FPS:30).

摘要

现有的基于分割的场景文本检测方法大多需要复杂的后处理,且后处理操作与训练过程分离,这大大降低了检测性能。先前的方法DBNet成功简化了后处理并将其集成到分割网络中。然而,该模型的训练过程需要1200个轮次,耗时较长,并且对各种尺度的文本缺乏敏感性,导致一些文本实例被遗漏。考虑到上述两个问题,我们设计了双曲正切二值化文本检测网络(HTBNet)。首先,我们提出了双曲正切二值化(HTB),通过它优化后的分割网络可以将轮次从1200减少到600,从而加快初始收敛速度。由于同一尺度特征图中不同通道的特征聚焦于图像中不同区域的信息,为了更好地表示图像中所有物体的重要特征,我们设计了多尺度通道注意力(MSCA)。同时,考虑到图像中的多尺度物体不能同时被检测到,我们提出了一种名为通道与空间融合模块(FMCS)的新型模块,它可以从通道和空间维度融合多尺度特征图。最后,我们采用交叉熵作为损失函数,它衡量预测值与真实值之间的差异。实验结果表明,与轻量级模型相比,HTBNet在Total-Text数据集上(F值:86.0%,帧率:30)和MSRA-TD500数据集上(F值:87.5%,帧率:30)取得了具有竞争力的性能和速度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb54/11276546/c1c281090698/entropy-26-00560-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验