Suppr超能文献

内省深度度量学习

Introspective Deep Metric Learning.

作者信息

Wang Chengkun, Zheng Wenzhao, Zhu Zheng, Zhou Jie, Lu Jiwen

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):1964-1980. doi: 10.1109/TPAMI.2023.3312311. Epub 2024 Mar 6.

Abstract

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1 K).

摘要

本文提出了一种用于图像不确定性感知比较的内省深度度量学习(IDML)框架。传统的深度度量学习方法专注于学习一种判别性嵌入来描述图像的语义特征,却忽略了由于噪声或语义模糊性导致的每幅图像中不确定性的存在。在未意识到这些不确定性的情况下进行训练会导致模型在训练期间过度拟合注释标签,并在推理期间产生过度自信的判断。受此启发,我们认为一个好的相似性模型应该在意识到不确定性的情况下考虑语义差异,以便更好地处理模糊图像以进行更稳健的训练。为了实现这一点,我们建议不仅使用语义嵌入来表示图像,还使用一个伴随的不确定性嵌入来分别描述图像的语义特征和模糊性。我们进一步提出了一种内省相似性度量,以在考虑图像语义差异和模糊性的同时对图像之间进行相似性判断。对所提出度量的梯度分析表明,它能使模型在训练期间以自适应且较慢的速度学习以处理不确定性。我们的框架在广泛使用的CUB - 200 - 2011、Cars196和斯坦福在线产品数据集上进行图像检索时达到了当前最优性能。我们还在ImageNet - 1K、CIFAR - 10和CIFAR - 100数据集上对我们的图像分类框架进行了评估,结果表明为现有的数据混合方法配备所提出的内省度量始终能取得更好的结果(例如,在ImageNet - 1K上使用CutMix时提高了0.44%)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验