Zhang Chunyu, Li Zhanshan
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China.
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China.
Neural Netw. 2025 Mar;183:106964. doi: 10.1016/j.neunet.2024.106964. Epub 2024 Nov 30.
In generalized zero-shot learning (GZSL), it is required to identify seen and unseen samples under the condition that only seen classes can be obtained during training. Recent methods utilize disentanglement to make the information contained in visual features semantically related, and ensuring semantic consistency and independence of the disentangled representations is the key to achieving better performance. However, we think there are still some limitations. Firstly, due to the fact that only seen classes can be obtained during training, the recognition of unseen samples will be poor. Secondly, the distribution relations of the representation space and the semantic space are different, and ignoring the discrepancy between them may impact the generalization of the model. In addition, the instances are associated with each other, and considering the interactions between them can obtain more discriminative information, which should not be ignored. Thirdly, since the synthesized visual features may not match the corresponding semantic descriptions well, it will compromise the learning of semantic consistency. To overcome these challenges, we propose to learn discriminative and transferable disentangled representations (DTDR) for generalized zero-shot learning. Firstly, we exploit the estimated class similarities to supervise the relations between seen semantic-matched representations and unseen semantic descriptions, thereby gaining better insight into the unseen domain. Secondly, we use cosine similarities between semantic descriptions to constrain the similarities between semantic-matched representations, thereby facilitating the distribution relation of semantic-matched representation space to approximate the distribution relation of semantic space. And during the process, the instance-level correlation can be taken into account. Thirdly, we reconstruct the synthesized visual features into the corresponding semantic descriptions to better establish the associations between them. The experimental results on four datasets verify the effectiveness of our method.
在广义零样本学习(GZSL)中,要求在训练期间只能获取已见类别的情况下识别已见和未见样本。最近的方法利用解缠来使视觉特征中包含的信息在语义上相关,并且确保解缠表示的语义一致性和独立性是实现更好性能的关键。然而,我们认为仍然存在一些局限性。首先,由于在训练期间只能获取已见类别,对未见样本的识别会很差。其次,表示空间和语义空间的分布关系不同,忽略它们之间的差异可能会影响模型的泛化能力。此外,实例之间是相互关联的,考虑它们之间的相互作用可以获得更具判别力的信息,这一点不应被忽视。第三,由于合成的视觉特征可能与相应的语义描述不太匹配,这会损害语义一致性的学习。为了克服这些挑战,我们提出为广义零样本学习学习判别性和可转移的解缠表示(DTDR)。首先,我们利用估计的类相似度来监督已见语义匹配表示和未见语义描述之间的关系,从而更好地了解未见领域。其次,我们使用语义描述之间 的余弦相似度来约束语义匹配表示之间的相似度,从而促进语义匹配表示空间的分布关系近似于语义空间的分布关系。并且在这个过程中,可以考虑实例级别的相关性。第三,我们将合成的视觉特征重建为相应的语义描述,以便更好地建立它们之间的关联。在四个数据集上的实验结果验证了我们方法的有效性。