Suppr超能文献

基于期望最大化算法的混合监督场景文本检测

Mixed-Supervised Scene Text Detection With Expectation-Maximization Algorithm.

作者信息

Zhao Mengbiao, Feng Wei, Yin Fei, Zhang Xu-Yao, Liu Cheng-Lin

出版信息

IEEE Trans Image Process. 2022;31:5513-5528. doi: 10.1109/TIP.2022.3197987. Epub 2022 Aug 22.

Abstract

Scene text detection is an important and challenging task in computer vision. For detecting arbitrarily-shaped texts, most existing methods require heavy data labeling efforts to produce polygon-level text region labels for supervised training. In order to reduce the cost in data labeling, we study mixed-supervised arbitrarily-shaped text detection by combining various weak supervision forms (e.g., image-level tags, coarse, loose and tight bounding boxes), which are far easier to annotate. Whereas the existing weakly-supervised learning methods (such as multiple instance learning) do not promote full object coverage, to approximate the performance of fully-supervised detection, we propose an Expectation-Maximization (EM) based mixed-supervised learning framework to train scene text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data. The polygon-level labels are treated as latent variables and recovered from the weak labels by the EM algorithm. A new contour-based scene text detector is also proposed to facilitate the use of weak labels in our mixed-supervised learning framework. Extensive experiments on six scene text benchmarks show that (1) using only 10% strongly annotated data and 90% weakly annotated data, our method yields comparable performance to that of fully supervised methods, (2) with 100% strongly annotated data, our method achieves state-of-the-art performance on five scene text benchmarks (CTW1500, Total-Text, ICDAR-ArT, MSRA-TD500, and C-SVT), and competitive results on the ICDAR2015 Dataset. We will make our weakly annotated datasets publicly available.

摘要

场景文本检测是计算机视觉中一项重要且具有挑战性的任务。对于检测任意形状的文本,大多数现有方法需要大量的数据标注工作来生成用于监督训练的多边形级文本区域标签。为了降低数据标注成本,我们通过结合各种弱监督形式(如图像级标签、粗糙、宽松和紧密边界框)来研究混合监督的任意形状文本检测,这些形式的标注要容易得多。然而,现有的弱监督学习方法(如多实例学习)并不能促进对整个对象的覆盖,为了接近完全监督检测的性能,我们提出了一种基于期望最大化(EM)的混合监督学习框架,仅使用少量多边形级标注数据与大量弱标注数据相结合来训练场景文本检测器。多边形级标签被视为潜在变量,并通过EM算法从弱标签中恢复。还提出了一种新的基于轮廓的场景文本检测器,以方便在我们的混合监督学习框架中使用弱标签。在六个场景文本基准测试上进行的大量实验表明:(1)仅使用10%的强标注数据和90%的弱标注数据,我们的方法产生的性能与完全监督方法相当;(2)使用100%的强标注数据时,我们的方法在五个场景文本基准测试(CTW1500、Total-Text、ICDAR-ArT、MSRA-TD500和C-SVT)上达到了当前最优性能,在ICDAR2015数据集上也取得了有竞争力的结果。我们将公开我们的弱标注数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验