Yu Yunlong, Ji Zhong, Guo Jichang, Pang Yanwei
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4116-4127. doi: 10.1109/TNNLS.2017.2753852. Epub 2017 Oct 12.
Zero-shot learning (ZSL) endows the computer vision system with the inferential capability to recognize new categories that have never seen before. Two fundamental challenges in it are visual-semantic embedding and domain adaptation in cross-modality learning and unseen class prediction steps, respectively. This paper presents two corresponding methods named Adaptive STructural Embedding (ASTE) and Self-PAced Selective Strategy (SPASS) for both challenges. Specifically, ASTE formulates the visual-semantic interactions in a latent structural support vector machine framework by adaptively adjusting the slack variables to embody different reliablenesses among training instances. To alleviate the domain shift problem in ZSL, SPASS borrows the idea from self-paced learning by iteratively selecting the unseen instances from reliable to less reliable to gradually adapt the knowledge from the seen domain to the unseen domain. Consequently, by combining SPASS and ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively reinforce the classification capacity. Extensive experiments on three benchmark data sets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and TASTE. Furthermore, we also propose a fast training (FT) strategy to improve the efficiency of most existing ZSL methods. The FT strategy is surprisingly simple and general enough, which speeds up the training time of most existing ZSL methods by 4~300 times while holding the previous performance.
零样本学习(ZSL)赋予计算机视觉系统识别从未见过的新类别的推理能力。其中两个基本挑战分别在于跨模态学习和未见类别预测步骤中的视觉语义嵌入和域适应。本文针对这两个挑战提出了两种相应的方法,即自适应结构嵌入(ASTE)和自定进度选择策略(SPASS)。具体而言,ASTE通过在潜在结构支持向量机框架中自适应调整松弛变量来体现训练实例之间不同的可靠性,从而制定视觉语义交互。为了缓解ZSL中的域转移问题,SPASS借鉴了自定进度学习的思想,通过迭代地从可靠到不可靠地选择未见实例,逐步将来自可见域的知识适应到未见域。因此,通过结合SPASS和ASTE,我们提出了一种自定进度转导ASTE(TASTE)方法来逐步增强分类能力。在三个基准数据集(即AwA、CUB和aPY)上进行的大量实验证明了ASTE和TASTE的优越性。此外,我们还提出了一种快速训练(FT)策略来提高大多数现有ZSL方法的效率。FT策略出奇地简单且通用,在保持先前性能的同时,将大多数现有ZSL方法的训练时间加快了4至300倍。