Togo Ren, Nakagawa Nao, Ogawa Takahiro, Haseyama Miki
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7529-7541. doi: 10.1109/TNNLS.2024.3404496. Epub 2025 Apr 4.
Disentangled representation learning aims at obtaining an independent latent representation without supervisory signals. However, the independence of a representation does not guarantee interpretability to match human intuition in the unsupervised settings. In this article, we introduce conceptual representation learning, an unsupervised strategy to learn a representation and its concepts. An antonym pair forms a concept, which determines the semantically meaningful axes in the latent space. Since the connection between signifying words and signified notions is arbitrary in natural languages, the verbalization of data features makes the representation make sense to humans. We thus construct Conceptual VAE (ConcVAE), a variational autoencoder (VAE)-based generative model with an explicit process in which the semantic representation of data is generated via trainable concepts. In visual data, ConcVAE utilizes natural language arbitrariness as an inductive bias of unsupervised learning by using a vision-language pretraining, which can tell an unsupervised model what makes sense to humans. Qualitative and quantitative evaluations show that the conceptual inductive bias in ConcVAE effectively disentangles the latent representation in a sense-making manner without supervision. Code is available at https://github.com/ganmodokix/concvae.
解缠表示学习旨在在无监督信号的情况下获得独立的潜在表示。然而,在无监督设置中,一种表示的独立性并不能保证其可解释性符合人类直觉。在本文中,我们介绍了概念表示学习,这是一种学习表示及其概念的无监督策略。一对反义词构成一个概念,该概念决定了潜在空间中语义上有意义的轴。由于在自然语言中,表意词和所指概念之间的联系是任意的,因此数据特征的语言表达使表示对人类有意义。因此,我们构建了概念变分自编码器(ConcVAE),这是一种基于变分自编码器(VAE)的生成模型,具有一个明确的过程,即通过可训练的概念生成数据的语义表示。在视觉数据中,ConcVAE通过使用视觉-语言预训练,将自然语言的任意性用作无监督学习的归纳偏差,这可以告诉无监督模型什么对人类有意义。定性和定量评估表明,ConcVAE中的概念归纳偏差在无监督的情况下以有意义的方式有效地解缠了潜在表示。代码可在https://github.com/ganmodokix/concvae获取。