Suppr超能文献

通过二元视觉语义嵌入实现可扩展的零样本学习。

Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings.

作者信息

Shen Fumin, Zhou Xiang, Yu Jun, Yang Yang, Liu Li, Shen Heng Tao

出版信息

IEEE Trans Image Process. 2019 Feb 18. doi: 10.1109/TIP.2019.2899987.

Abstract

Zero-shot learning aims to classify visual instances from unseen classes in the absence of training examples. This is typically achieved by directly mapping visual features to a semantic embedding space of classes (e.g., attributes or word vectors), where the similarity between the two modalities can be readily measured. However, the semantic space may not be reliable for recognition due to the noisy class embeddings or visual bias problem. In this work, we propose a novel Binary embedding based Zero-Shot Learning (BZSL) method, which recognizes visual instances from unseen classes through an intermediate discriminative Hamming space. Specifically, BZSL jointly learns two binary coding functions to encode both visual instances and class embeddings into the Hamming space, which well alleviates the visual-semantic bias problem. As a desiring property, classifying an unseen instance thereby can be efficiently done by retrieving its nearest-class codes with minimal Hamming distance. During training, by introducing two auxiliary variables for the coding functions, we formulate an equivalent correlation maximization problem, which admits an analytical solution. The resulting algorithm thus enjoys both highly efficient training and scalable novel class inferring. Extensive experiments on four benchmark datasets, including the full ImageNet Fall 2011 dataset with over 20K unseen classes, demonstrate the superiority of our method on the zero-shot learning task. Particularly, we show that increasing the binary embedding dimension can inevitably improve the recognition accuracy.

摘要

零样本学习旨在在没有训练示例的情况下对来自未见类别的视觉实例进行分类。这通常通过将视觉特征直接映射到类别的语义嵌入空间(例如,属性或词向量)来实现,在该空间中可以很容易地测量两种模态之间的相似性。然而,由于嘈杂的类嵌入或视觉偏差问题,语义空间可能对于识别不可靠。在这项工作中,我们提出了一种新颖的基于二进制嵌入的零样本学习(BZSL)方法,该方法通过中间判别汉明空间识别来自未见类别的视觉实例。具体而言,BZSL联合学习两个二进制编码函数,将视觉实例和类嵌入都编码到汉明空间中,这很好地缓解了视觉语义偏差问题。作为一个理想的特性,通过检索具有最小汉明距离的最近类代码,可以有效地对未见实例进行分类。在训练期间,通过为编码函数引入两个辅助变量,我们制定了一个等效的相关性最大化问题,该问题允许解析解。因此,所得算法兼具高效训练和可扩展的新颖类推断能力。在四个基准数据集上进行的广泛实验,包括具有超过20K未见类别的完整ImageNet 2011秋季数据集,证明了我们的方法在零样本学习任务上的优越性。特别地,我们表明增加二进制嵌入维度可以不可避免地提高识别准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验