Liu Yang, Gao Xinbo, Han Jungong, Shao Ling
IEEE Trans Cybern. 2023 Jun;53(6):3794-3805. doi: 10.1109/TCYB.2022.3164142. Epub 2023 May 17.
Zero-shot learning (ZSL) aims to classify unseen samples based on the relationship between the learned visual features and semantic features. Traditional ZSL methods typically capture the underlying multimodal data structures by learning an embedding function between the visual space and the semantic space with the Euclidean metric. However, these models suffer from the hubness problem and domain bias problem, which leads to unsatisfactory performance, especially in the generalized ZSL (GZSL) task. To tackle such a problem, we formulate a discriminative cross-aligned variational autoencoder (DCA-VAE) for ZSL. The proposed model effectively utilizes a modified cross-modal-alignment variational autoencoder (VAE) to transform both visual features and semantic features obtained by the discriminative cosine metric into latent features. The key to our method is that we collect principal discriminative information from visual and semantic features to construct latent features which contain the discriminative multimodal information associated with unseen samples. Finally, the proposed model DCA-VAE is validated on six benchmarks including the large dataset ImageNet, and several experimental results demonstrate the superiority of DCA-VAE over most existing embedding or generative ZSL models on the standard ZSL and the more realistic GZSL tasks.
零样本学习(ZSL)旨在基于所学视觉特征和语义特征之间的关系对未见样本进行分类。传统的ZSL方法通常通过学习具有欧几里得度量的视觉空间和语义空间之间的嵌入函数来捕获潜在的多模态数据结构。然而,这些模型存在枢纽性问题和域偏差问题,这导致性能不尽人意,尤其是在广义ZSL(GZSL)任务中。为了解决此类问题,我们为ZSL制定了一种判别式交叉对齐变分自编码器(DCA-VAE)。所提出的模型有效地利用了一种改进的跨模态对齐变分自编码器(VAE),将通过判别余弦度量获得的视觉特征和语义特征都转换为潜在特征。我们方法的关键在于,我们从视觉和语义特征中收集主要判别信息,以构建包含与未见样本相关的判别多模态信息的潜在特征。最后,在包括大型数据集ImageNet在内的六个基准上对所提出的模型DCA-VAE进行了验证,一些实验结果证明了DCA-VAE在标准ZSL和更实际的GZSL任务上优于大多数现有的嵌入或生成式ZSL模型。