Wu Zhihao, Xu Yong, Yang Jian, Li Xuelong
IEEE Trans Image Process. 2024;33:3413-3427. doi: 10.1109/TIP.2024.3402981. Epub 2024 May 31.
Weakly supervised object detection (WSOD) aims to train detectors using only image-category labels. Current methods typically first generate dense class-agnostic proposals and then select objects based on the classification scores of these proposals. These methods mainly focus on selecting the proposal having high Intersection-over-Union with the true object location, while ignoring the problem of misclassification, which occurs when some proposals exhibit semantic similarities with objects from other categories due to viewing perspective and background interference. We observe that the positive class that is misclassified typically has the following two characteristics: 1) It is usually misclassified as one or a few specific negative classes, and the scores of these negative classes are high; 2) Compared to other negative classes, the score of the positive class is relatively high. Based on these two characteristics, we propose misclassification correction (MCC) and misclassification tolerance (MCT) respectively. In MCC, we establish a misclassification memory bank to record and summarize the class-pairs with high frequencies of potential misclassifications in the early stage of training, that is, cases where the score of a negative class is significantly higher than that of the positive class. In the later stage of training, when such cases occur and correspond to the summarized class-pairs, we select the top-scoring negative class proposal as the positive training example. In MCT, we decrease the loss weights of misclassified classes in the later stage of training to avoid them dominating training and causing misclassification of objects from other classes that are semantically similar to them during inference. Extensive experiments on the PASCAL VOC and MS COCO demonstrate our method can alleviate the problem of misclassification and achieve the state-of-the-art results.
弱监督目标检测(WSOD)旨在仅使用图像类别标签来训练检测器。当前的方法通常首先生成密集的类别无关提议,然后基于这些提议的分类分数来选择目标。这些方法主要侧重于选择与真实目标位置具有高交并比的提议,而忽略了误分类问题,即由于视角和背景干扰,一些提议与其他类别的目标表现出语义相似性时会出现误分类。我们观察到被误分类的正类通常具有以下两个特征:1)它通常被误分类为一个或几个特定的负类,并且这些负类的分数很高;2)与其他负类相比,正类的分数相对较高。基于这两个特征,我们分别提出了误分类校正(MCC)和误分类容忍(MCT)。在MCC中,我们建立一个误分类记忆库,以记录和总结在训练早期潜在误分类频率高的类别对,即负类分数明显高于正类分数的情况。在训练后期,当出现这种情况且与总结的类别对相对应时,我们选择得分最高的负类提议作为正训练示例。在MCT中,我们在训练后期降低误分类类别的损失权重,以避免它们主导训练并在推理过程中导致与它们语义相似的其他类别的目标被误分类。在PASCAL VOC和MS COCO上进行的大量实验表明,我们的方法可以缓解误分类问题并取得当前最优的结果。