Suppr超能文献

基于邻域注意力标签校正的鲁棒细粒度视觉识别

Robust Fine-Grained Visual Recognition With Neighbor-Attention Label Correction.

作者信息

Mao Shunan, Zhang Shiliang

出版信息

IEEE Trans Image Process. 2024;33:2614-2626. doi: 10.1109/TIP.2024.3378461. Epub 2024 Apr 3.

Abstract

Existing deep learning methods for fine-grained visual recognition often rely on large-scale, well-annotated training data. Obtaining fine-grained annotations in the wild typically requires concentration and expertise, such as fine category annotation for species recognition, instance annotation for person re-identification (re-id) and dense annotation for segmentation, which inevitably leads to label noise. This paper aims to tackle label noise in deep model training for fine-grained visual recognition. We propose a Neighbor-Attention Label Correction (NALC) model to correct labels during the training stage. NALC samples a training batch and a validation batch from the training set. It hence leverages a meta-learning framework to correct labels in the training batch based on the validation batch. To enhance the optimization efficiency, we introduce a novel nested optimization algorithm for the meta-learning framework. The proposed training procedure consistently improves label accuracy in the training batch, consequently enhancing the learned image representation. Experimental results demonstrate that our method significantly increases label accuracy from 70% to over 98% and outperforms recent approaches by up to 13.4% in mean Average Precision (mAP) on various fine-grained image retrieval (FGIR) tasks, including instance retrieval on CUB200 and person re-id on Market1501. We also demonstrate the efficacy of NALC on noisy semantic segmentation datasets generated from Cityscapes, where it achieves a significant 7.8% improvement in mIOU score. NALC also exhibits robustness to different types of noise, including simulated noise such as Asymmetric, Pair-Flip, and Pattern noise, as well as practical noisy labels generated by tracklets and clustering.

摘要

现有的用于细粒度视觉识别的深度学习方法通常依赖于大规模、标注良好的训练数据。在自然环境中获取细粒度标注通常需要专注和专业知识,例如用于物种识别的精细类别标注、用于行人重识别(re-id)的实例标注以及用于分割的密集标注,这不可避免地会导致标签噪声。本文旨在解决细粒度视觉识别深度模型训练中的标签噪声问题。我们提出了一种邻居注意力标签校正(NALC)模型,用于在训练阶段校正标签。NALC从训练集中采样一个训练批次和一个验证批次。因此,它利用元学习框架基于验证批次校正训练批次中的标签。为了提高优化效率,我们为元学习框架引入了一种新颖的嵌套优化算法。所提出的训练过程持续提高训练批次中的标签准确性,从而增强学习到的图像表示。实验结果表明,我们的方法显著提高了标签准确性,从70%提高到超过98%,并且在各种细粒度图像检索(FGIR)任务上,包括在CUB200上的实例检索和在Market1501上的行人重识别,在平均精度均值(mAP)方面比最近的方法高出多达13.4%。我们还证明了NALC在从Cityscapes生成的噪声语义分割数据集上的有效性,在该数据集上它在平均交并比(mIOU)得分上实现了显著的7.8%的提升。NALC对不同类型的噪声也表现出鲁棒性,包括模拟噪声,如不对称噪声、成对翻转噪声和模式噪声,以及由轨迹和聚类生成的实际噪声标签。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验