IEEE Trans Image Process. 2019 Apr;28(4):1824-1836. doi: 10.1109/TIP.2018.2881926. Epub 2018 Nov 16.
Zero-shot learning (ZSL) for visual recognition aims to accurately recognize the objects of unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. However, the semantic gap across visual features and their underlying semantics is still a big obstacle in ZSL. Conventional ZSL methods construct that the mapping typically focus on the original visual features that are independent of the ZSL tasks, thus degrading the prediction performance. In this paper, we propose an effective method to uncover an appropriate latent representation of data for the purpose of zero-shot classification. Specifically, we formulate a novel framework to jointly learn the latent subspace and cross-modal embedding to link visual features with their semantic representations. The proposed framework combines feature learning and semantics prediction, such that the learned data representation is more discriminative to predict the semantic vectors, hence improving the overall classification performance. To learn a robust latent subspace, we explicitly avoid the information loss by ensuring the reconstruction ability of the obtained data representation. An efficient algorithm is designed to solve the proposed optimization problem. To fully exploit the intrinsic geometric structure of data, we develop a manifold regularization strategy to refine the learned semantic representations, leading to further improvements of the classification performance. To validate the effectiveness of the proposed approach, extensive experiments are conducted on three ZSL benchmarks and encouraging results are achieved compared with the state-of-the-art ZSL methods.
零样本学习(ZSL)旨在通过将视觉特征映射到由类别语义信息所构成的嵌入空间,来准确地识别未见过类别的物体。然而,视觉特征与其底层语义之间的语义鸿沟仍然是 ZSL 的一个巨大障碍。传统的 ZSL 方法构建的映射通常侧重于与 ZSL 任务无关的原始视觉特征,从而降低了预测性能。在本文中,我们提出了一种有效的方法,旨在为零样本分类目的揭示数据的适当潜在表示。具体来说,我们提出了一个新颖的框架,联合学习潜在子空间和跨模态嵌入,以将视觉特征与其语义表示联系起来。所提出的框架结合了特征学习和语义预测,使得学习到的数据表示对于预测语义向量更具辨别力,从而提高了整体分类性能。为了学习稳健的潜在子空间,我们通过确保获得的数据表示的重建能力来明确避免信息丢失。设计了一种有效的算法来解决所提出的优化问题。为了充分利用数据的内在几何结构,我们开发了一种流形正则化策略来完善学习到的语义表示,从而进一步提高分类性能。为了验证所提出方法的有效性,我们在三个 ZSL 基准上进行了广泛的实验,并与最先进的 ZSL 方法相比取得了令人鼓舞的结果。