Zhu Sijie, Yang Taojiannan, Chen Chen
IEEE Trans Image Process. 2021;30:7593-7607. doi: 10.1109/TIP.2021.3107214. Epub 2021 Sep 8.
This work explores the visual explanation for deep metric learning and its applications. As an important problem for learning representation, metric learning has attracted much attention recently, while the interpretation of the metric learning model is not as well-studied as classification. To this end, we propose an intuitive idea to show where contributes the most to the overall similarity of two input images by decomposing the final activation. Instead of only providing the overall activation map of each image, we propose to generate point-to-point activation intensity between two images so that the relationship between different regions is uncovered. We show that the proposed framework can be directly applied to a wide range of metric learning applications and provides valuable information for model understanding. Both theoretical and empirical analyses are provided to demonstrate the superiority of the proposed overall activation map over existing methods. Furthermore, our experiments validate the effectiveness of the proposed point-specific activation map on two applications, i.e. cross-view pattern discovery and interactive retrieval. Code is available at https://github.com/Jeff-Zilence/Explain_Metric_Learning.
这项工作探索了深度度量学习的视觉解释及其应用。作为学习表示的一个重要问题,度量学习最近受到了广泛关注,而度量学习模型的解释却没有像分类那样得到充分研究。为此,我们提出了一个直观的想法,通过分解最终激活来展示两个输入图像的整体相似度中哪个部分贡献最大。我们不是只提供每个图像的整体激活图,而是提议生成两个图像之间点对点的激活强度,以便揭示不同区域之间的关系。我们表明,所提出的框架可以直接应用于广泛的度量学习应用,并为模型理解提供有价值的信息。通过理论和实证分析来证明所提出的整体激活图优于现有方法。此外,我们的实验验证了所提出的特定点激活图在两个应用中的有效性,即跨视图模式发现和交互式检索。代码可在https://github.com/Jeff-Zilence/Explain_Metric_Learning获取。