Suppr超能文献

生成对抗网络的正交子空间表示

Orthogonal Subspace Representation for Generative Adversarial Networks.

作者信息

Jiang Hongxiang, Luo Xiaoyan, Yin Jihao, Fu Huazhu, Wang Fuxiang

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4413-4427. doi: 10.1109/TNNLS.2024.3377436. Epub 2025 Feb 28.

Abstract

Disentanglement learning aims to separate explanatory factors of variation so that different attributes of the data can be well characterized and isolated, which promotes efficient inference for downstream tasks. Mainstream disentanglement approaches based on generative adversarial networks (GANs) learn interpretable data representation. However, most typical GAN-based works lack the discussion of the latent subspace, causing insufficient consideration of the variation of independent factors. Although some recent research analyzes the latent space on pretrained GANs for image editing, they do not emphasize learning representation directly from the subspace perspective. Appropriate subspace properties could facilitate corresponding feature representation learning to satisfy the independent variation requirements of the obtained explanatory factors, which is crucial for better disentanglement. In this work, we propose a unified framework for ensuring disentanglement, which fully investigates latent subspace learning (SL) in GAN. The novel GAN-based architecture explores orthogonal subspace representation (OSR) on vanilla GAN, named OSRGAN. To guide a subspace with strong correlation, less redundancy, and robust distinguishability, our OSR includes three stages, self-latent-aware, orthogonal subspace-aware, and structure representation-aware, respectively. First, the self-latent-aware stage promotes the latent subspace strongly correlated with the data space to discover interpretable factors, but with poor independence of variation. Second, the following orthogonal subspace-aware stage adaptively learns some 1-D linear subspace spanned by a set of orthogonal bases in the latent space. There is less redundancy between them, expressing the corresponding independence. Third, the structure representation-aware stage aligns the projection on the orthogonal subspace and the latent variables. Accordingly, feature representation in each linear subspace can be distinguishable, enhancing the independent expression of interpretable factors. In addition, we design an alternating optimization step, achieving a tradeoff training of OSRGAN on different properties. Despite it strictly constrains orthogonality, the loss weight coefficient of distinguishability induced by orthogonality could be adjusted and balanced with correlation constraint. To elucidate, this tradeoff training prevents our OSRGAN from overemphasizing any property and damaging the expressiveness of the feature representation. It takes into account both interpretable factors and their independent variation characteristics. Meanwhile, alternating optimization could keep the cost and efficiency of forward inference unchanged and will not burden the computational complexity. In theory, we clarify the significance of OSR, which brings better independence of factors, along with interpretability as correlation could converge to a high range faster. Moreover, through the convergence behavior analysis, including the objective functions under different constraints and the evaluation curve with iterations, our model demonstrates enhanced stability and definitely converges toward a higher peak for disentanglement. To depict the performance in downstream tasks, we compared the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our OSRGAN achieves higher disentanglement scores on FactorVAE, SAP, MIG, and VP metrics. All the experimental results illustrate that our novel GAN-based framework has considerable advantages on disentanglement.

摘要

解缠学习旨在分离变异的解释性因素,以便能够很好地刻画和隔离数据的不同属性,从而促进下游任务的高效推理。基于生成对抗网络(GAN)的主流解缠方法学习可解释的数据表示。然而,大多数典型的基于GAN的工作缺乏对潜在子空间的讨论,导致对独立因素变化的考虑不足。尽管最近的一些研究分析了用于图像编辑的预训练GAN的潜在空间,但它们没有直接从子空间角度强调学习表示。适当的子空间属性可以促进相应的特征表示学习,以满足所获得的解释性因素的独立变化要求,这对于更好的解缠至关重要。在这项工作中,我们提出了一个确保解缠的统一框架,该框架全面研究了GAN中的潜在子空间学习(SL)。基于GAN的新颖架构在普通GAN上探索正交子空间表示(OSR),称为OSRGAN。为了引导具有强相关性、较少冗余和强大可区分性的子空间,我们的OSR包括三个阶段,分别是自潜在感知、正交子空间感知和结构表示感知。首先,自潜在感知阶段促进与数据空间高度相关的潜在子空间,以发现可解释的因素,但变异的独立性较差。其次,随后的正交子空间感知阶段在潜在空间中自适应地学习由一组正交基所张成的一些一维线性子空间。它们之间的冗余较少,表达了相应的独立性。第三,结构表示感知阶段对齐正交子空间上的投影和潜在变量。因此,每个线性子空间中的特征表示可以是可区分的,增强了可解释因素的独立表达。此外,我们设计了一个交替优化步骤,在不同属性上实现对OSRGAN的权衡训练。尽管它严格约束正交性,但由正交性引起的可区分性损失权重系数可以进行调整,并与相关性约束相平衡。具体来说,这种权衡训练可防止我们的OSRGAN过度强调任何属性并损害特征表示的表现力。它同时考虑了可解释因素及其独立变化特征。同时,交替优化可以保持前向推理的成本和效率不变,并且不会增加计算复杂度。在理论上,我们阐明了OSR的重要性,它带来了更好的因素独立性,同时由于相关性可以更快地收敛到较高范围,还具有可解释性。此外,通过收敛行为分析,包括不同约束下的目标函数以及带有迭代的评估曲线,我们的模型展示了增强的稳定性,并且在解缠方面肯定会朝着更高的峰值收敛。为了描述在下游任务中的性能,我们在不同数据集上比较了基于GAN的最先进方法甚至基于VAE的方法。我们的OSRGAN在FactorVAE、SAP、MIG和VP指标上获得了更高的解缠分数。所有实验结果表明,我们基于GAN的新颖框架在解缠方面具有相当大的优势。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验