Shen Yunhang, Ji Rongrong, Yang Kuiyuan, Deng Cheng, Wang Changhu
IEEE Trans Image Process. 2019 Aug 22. doi: 10.1109/TIP.2019.2933735.
Weakly supervised object detection has attracted increasing research attention recently. To this end, most existing schemes rely on scoring category-independent region proposals, which is formulated as a multiple instance learning problem. During this process, the proposal scores are aggregated and supervised by only image-level labels, which often fails to locate object boundaries precisely. In this paper, we break through such a restriction by taking a deeper look into the score aggregation stage and propose a Category-aware Spatial Constraint (CSC) scheme for proposals, which is integrated into weakly supervised object detection in an end-to-end learning manner. In particular, we incorporate the global shape information of objects as an unsupervised constraint, which is inferred from build-in foreground-and-background cues, termed Category-specific Pixel Gradient (CPG) maps. Specifically, each region proposal is weighted according to how well it covers the estimated shape of objects. For each category, a multi-center regularization is further introduced to penalize the violations between centers cluster and high-score proposals in a given image. Extensive experiments are done on the most widely-used benchmark Pascal VOC and COCO, which shows that our approach significantly improves weakly supervised object detection without adding new learnable parameters to the existing models nor changing the structures of CNNs.
弱监督目标检测最近引起了越来越多的研究关注。为此,大多数现有方案依赖于对与类别无关的区域提议进行评分,这被表述为一个多实例学习问题。在此过程中,提议分数仅由图像级标签进行汇总和监督,这往往无法精确地定位目标边界。在本文中,我们通过更深入地研究分数汇总阶段突破了这种限制,并提出了一种用于提议的类别感知空间约束(CSC)方案,该方案以端到端学习的方式集成到弱监督目标检测中。具体而言,我们将目标的全局形状信息作为一种无监督约束纳入,该信息是从内置的前景和背景线索(称为特定类别像素梯度(CPG)图)中推断出来的。具体来说,每个区域提议根据其覆盖估计目标形状的程度进行加权。对于每个类别,还引入了多中心正则化来惩罚给定图像中中心聚类与高分提议之间的违规情况。在最广泛使用的基准Pascal VOC和COCO上进行了大量实验,结果表明我们的方法在不向现有模型添加新可学习参数也不改变卷积神经网络结构的情况下,显著提高了弱监督目标检测的性能。