Painsky Amichai, Feder Meir, Tishby Naftali
The Industrial Engineering Department, Tel Aviv University, Tel Aviv 6997801, Israel.
The School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel.
Entropy (Basel). 2020 Feb 12;22(2):208. doi: 10.3390/e22020208.
Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Nonlinear CCA extends this notion to a broader family of transformations, which are more powerful in many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) algorithm provides an optimal solution to the nonlinear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of samples is available. In this work, we introduce an information-theoretic compressed representation framework for the nonlinear CCA problem (CRCCA), which extends the classical ACE approach. Our suggested framework seeks compact representations of the data that allow a maximal level of correlation. This way, we control the trade-off between the flexibility and the complexity of the model. CRCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it allows a soft dimensionality reduction, as the compression level is determined by the mutual information between the original noisy data and the extracted signals. Finally, we introduce a simple implementation of the CRCCA framework, based on lattice quantization.
典型相关分析(CCA)是一种线性表示学习方法,旨在寻找多视图数据中具有最大相关性的变量。非线性CCA将这一概念扩展到更广泛的变换族,在许多实际应用中更具强大功能。给定联合概率,交替条件期望(ACE)算法为非线性CCA问题提供了最优解。然而,当只有有限数量的样本可用时,它存在性能有限和计算负担不断增加的问题。在这项工作中,我们为非线性CCA问题引入了一个信息论压缩表示框架(CRCCA),它扩展了经典的ACE方法。我们提出的框架寻求数据的紧凑表示,以实现最大程度的相关性。通过这种方式,我们控制了模型灵活性和复杂性之间的权衡。CRCCA提供了理论界限和最优性条件,因为我们建立了与率失真理论、信息瓶颈和远程源编码的基本联系。此外,由于压缩级别由原始噪声数据和提取信号之间的互信息决定,它允许进行软维数约简。最后,我们基于格型量化引入了CRCCA框架的一种简单实现。