论信息瓶颈与深度信息瓶颈之间的差异。

On the Difference between the Information Bottleneck and the Deep Information Bottleneck.

作者信息

Wieczorek Aleksander, Roth Volker

机构信息

Department of Mathematics and Computer Science, University of Basel, CH-4051 Basel, Switzerland.

出版信息

Entropy (Basel). 2020 Jan 22;22(2):131. doi: 10.3390/e22020131.

DOI:10.3390/e22020131

PMID:33285906

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516540/

Abstract

Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, and , and their latent representation , take the form of two Markov chains T - X - Y and X - T - Y . Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions P ( X , Y , T ) . We, therefore, show how to circumvent this limitation by optimising a lower bound for the mutual information between and : I ( T ; Y ) , for which only the latter Markov chain has to be satisfied. The mutual information I ( T ; Y ) can be split into two non-negative parts. The first part is the lower bound for I ( T ; Y ) , which is optimised in deep variational information bottleneck (DVIB) and cognate models in practice. The second part consists of two terms that measure how much the former requirement T - X - Y is violated. Finally, we propose interpreting the family of information bottleneck models as directed graphical models, and show that in this framework, the original and deep information bottlenecks are special cases of a fundamental IB model.

摘要

通过用深度神经网络替换互信息项，将信息瓶颈模型与深度学习相结合，已在从生成建模到解释深度神经网络等多个领域取得成功。在本文中，我们重新审视深度变分信息瓶颈及其推导所需的假设。数据的两个假设属性以及及其潜在表示，采用两个马尔可夫链T - X - Y和X - T - Y的形式。要求在优化过程中两者都成立，对于潜在联合分布P(X, Y, T)的集合可能具有局限性。因此，我们展示了如何通过优化和之间互信息的下界：I(T; Y)来规避这一限制，对于该下界，只需满足后一个马尔可夫链即可。互信息I(T; Y)可以拆分为两个非负部分。第一部分是I(T; Y)的下界，在深度变分信息瓶颈（DVIB）及相关模型中实际进行了优化。第二部分由两个项组成，用于衡量前一个要求T - X - Y被违反的程度。最后，我们建议将信息瓶颈模型家族解释为有向图模型，并表明在这个框架中，原始信息瓶颈和深度信息瓶颈是基本信息瓶颈模型的特殊情况。