Aguerri Inaki Estella, Zaidi Abdellatif
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):120-138. doi: 10.1109/TPAMI.2019.2928806. Epub 2020 Dec 4.
The problem of distributed representation learning is one in which multiple sources of information X,…, X are processed separately so as to learn as much information as possible about some ground truth Y. We investigate this problem from information-theoretic grounds, through a generalization of Tishby's centralized Information Bottleneck (IB) method to the distributed setting. Specifically, K encoders, K ≥ 2, compress their observations X,…, X separately in a manner such that, collectively, the produced representations preserve as much information as possible about Y. We study both discrete memoryless (DM) and memoryless vector Gaussian data models. For the discrete model, we establish a single-letter characterization of the optimal tradeoff between complexity (or rate) and relevance (or information) for a class of memoryless sources (the observations X,…, X being conditionally independent given Y). For the vector Gaussian model, we provide an explicit characterization of the optimal complexity-relevance tradeoff. Furthermore, we develop a variational bound on the complexity-relevance tradeoff which generalizes the evidence lower bound (ELBO) to the distributed setting. We also provide two algorithms that allow to compute this bound: i) a Blahut-Arimoto type iterative algorithm which enables to compute optimal complexity-relevance encoding mappings by iterating over a set of self-consistent equations, and ii) a variational inference type algorithm in which the encoding mappings are parametrized by neural networks and the bound approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on synthetic and real datasets are provided to support the efficiency of the approaches and algorithms developed in this paper.
分布式表示学习问题是指对多个信息源X,…,X分别进行处理,以便尽可能多地了解关于某个基本事实Y的信息。我们从信息论的角度研究这个问题,通过将蒂什比的集中式信息瓶颈(IB)方法推广到分布式环境中。具体来说,K个编码器(K≥2)分别对其观测值X,…,X进行压缩,使得生成的表示总体上保留尽可能多的关于Y的信息。我们研究离散无记忆(DM)和无记忆向量高斯数据模型。对于离散模型,我们建立了一类无记忆源(给定Y时观测值X,…,X条件独立)在复杂度(或速率)和相关性(或信息)之间最优权衡的单字母表征。对于向量高斯模型,我们给出了最优复杂度 - 相关性权衡的显式表征。此外,我们推导出复杂度 - 相关性权衡的变分界,它将证据下界(ELBO)推广到分布式环境。我们还提供了两种计算此界的算法:i)一种布拉胡特 - 阿里莫托类型的迭代算法,通过迭代一组自洽方程来计算最优复杂度 - 相关性编码映射;ii)一种变分推理类型算法,其中编码映射由神经网络参数化,界通过马尔可夫采样近似并用随机梯度下降进行优化。提供了关于合成数据集和真实数据集的数值结果,以支持本文所开发方法和算法的有效性。