Fu Zhenyong, Xiang Tao, Kodirov Elyor, Gong Shaogang
IEEE Trans Pattern Anal Mach Intell. 2018 Aug;40(8):2009-2022. doi: 10.1109/TPAMI.2017.2737007. Epub 2017 Aug 7.
Zero-Shot Learning (ZSL) for visual recognition is typically achieved by exploiting a semantic embedding space. In such a space, both seen and unseen class labels as well as image features can be embedded so that the similarity among them can be measured directly. In this work, we consider that the key to effective ZSL is to compute an optimal distance metric in the semantic embedding space. Existing ZSL works employ either euclidean or cosine distances. However, in a high-dimensional space where the projected class labels (prototypes) are sparse, these distances are suboptimal, resulting in a number of problems including hubness and domain shift. To overcome these problems, a novel manifold distance computed on a semantic class prototype graph is proposed which takes into account the rich intrinsic semantic structure, i.e., semantic manifold, of the class prototype distribution. To further alleviate the domain shift problem, a new regularisation term is introduced into a ranking loss based embedding model. Specifically, the ranking loss objective is regularised by unseen class prototypes to prevent the projected object features from being biased towards the seen prototypes. Extensive experiments on four benchmarks show that our method significantly outperforms the state-of-the-art.
用于视觉识别的零样本学习(ZSL)通常通过利用语义嵌入空间来实现。在这样的空间中,可见和不可见的类别标签以及图像特征都可以被嵌入,以便能够直接测量它们之间的相似度。在这项工作中,我们认为有效零样本学习的关键是在语义嵌入空间中计算一个最优距离度量。现有的零样本学习工作采用欧几里得距离或余弦距离。然而,在投影类别标签(原型)稀疏的高维空间中,这些距离并非最优,会导致包括中心性和域偏移在内的诸多问题。为了克服这些问题,我们提出了一种在语义类别原型图上计算的新型流形距离,它考虑了类别原型分布丰富的内在语义结构,即语义流形。为了进一步缓解域偏移问题,我们在基于排序损失的嵌入模型中引入了一个新的正则化项。具体而言,排序损失目标通过不可见类别原型进行正则化,以防止投影的对象特征偏向可见原型。在四个基准上进行的大量实验表明,我们的方法显著优于现有技术。