Xu Lian, Bennamoun Mohammed, Boussaid Farid, Ouyang Wanli, Sohel Ferdous, Xu Dan
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5082-5096. doi: 10.1109/TNNLS.2024.3373566. Epub 2025 Feb 28.
Most existing weakly supervised semantic segmentation (WSSS) methods rely on class activation mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pretrained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant intertask correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multilabel image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.
大多数现有的弱监督语义分割(WSSS)方法依赖于类激活映射(CAM),使用图像级标签来提取粗略的特定类定位映射。先前的工作通常使用离线启发式阈值处理过程,该过程将CAM映射与由通用预训练显著模型生成的现成显著映射相结合,以产生更准确的伪分割标签。我们提出了AuxSegNet+,这是一个弱监督辅助学习框架,用于探索这些显著映射中的丰富信息以及显著检测和语义分割之间的重要任务间相关性。在所提出的AuxSegNet+中,显著检测和多标签图像分类被用作辅助任务,以仅使用图像级真实标签来改进语义分割的主要任务。我们还提出了一种跨任务亲和力学习机制,以从显著和分割特征映射中学习像素级亲和力。特别是,我们提出了一个跨任务双亲和力学习模块,用于学习成对亲和力和一元亲和力,通过聚合显著检测和语义分割的查询相关和查询无关全局上下文,来增强特定任务特征和预测。学习到的跨任务成对亲和力还可用于细化和传播CAM映射,为两个任务提供更好的伪标签。通过跨任务亲和力学习和伪标签更新实现分割性能的迭代改进。广泛的实验证明了所提出方法的有效性,在具有挑战性的PASCAL VOC和MS COCO基准测试中取得了新的WSSS最新成果。