Milani Federico, Pinciroli Vago Nicolò Oreste, Fraternali Piero
Department of Electronics Information and Bioengineering, Politecnico di Milano, 20133 Milano, Italy.
J Imaging. 2022 Aug 6;8(8):215. doi: 10.3390/jimaging8080215.
Object Detection requires many precise annotations, which are available for natural images but not for many non-natural data sets such as artworks data sets. A solution is using Weakly Supervised Object Detection (WSOD) techniques that learn accurate object localization from image-level labels. Studies have demonstrated that state-of-the-art end-to-end architectures may not be suitable for domains in which images or classes sensibly differ from those used to pre-train networks. This paper presents a novel two-stage Weakly Supervised Object Detection approach for obtaining accurate bounding boxes on non-natural data sets. The proposed method exploits existing classification knowledge to generate pseudo-ground truth bounding boxes from Class Activation Maps (CAMs). The automatically generated annotations are used to train a robust Faster R-CNN object detector. Quantitative and qualitative analysis shows that bounding boxes generated from CAMs can compensate for the lack of manually annotated ground truth (GT) and that an object detector, trained with such pseudo-GT, surpasses end-to-end WSOD state-of-the-art methods on ArtDL 2.0 (≈41.5% mAP) and IconArt (≈17% mAP), two artworks data sets. The proposed solution is a step towards the computer-aided study of non-natural images and opens the way to more advanced tasks, e.g., automatic artwork image captioning for digital archive applications.
目标检测需要许多精确的标注,这些标注可用于自然图像,但对于许多非自然数据集(如图术数据集)则不可用。一种解决方案是使用弱监督目标检测(WSOD)技术,该技术可从图像级标签中学习准确的目标定位。研究表明,当前最先进的端到端架构可能不适用于图像或类别与用于预训练网络的图像或类别明显不同的领域。本文提出了一种新颖的两阶段弱监督目标检测方法,用于在非自然数据集上获得准确的边界框。所提出的方法利用现有的分类知识从类激活映射(CAM)生成伪真值边界框。自动生成的标注用于训练一个强大的Faster R-CNN目标检测器。定量和定性分析表明,从CAM生成的边界框可以弥补手动标注真值(GT)的不足,并且使用这种伪GT训练的目标检测器在两个图术数据集ArtDL 2.0(约41.5%平均精度均值)和IconArt(约17%平均精度均值)上超过了端到端WSOD的当前最先进方法。所提出的解决方案是朝着非自然图像的计算机辅助研究迈出的一步,并为更高级的任务(例如用于数字存档应用的自动图术图像字幕)开辟了道路。