Asoodeh Shahab, Calmon Flavio P
School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA.
Entropy (Basel). 2020 Nov 20;22(11):1325. doi: 10.3390/e22111325.
Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), and strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding, and dependence dilution. Leveraging these connections, we prove a new cardinality bound on the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed "bottleneck problems", by replacing mutual information in IB and PF with other notions of mutual information, namely -information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantees in a variety of inference tasks with statistical constraints on accuracy and privacy. While the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to a binary case, we derive closed form expressions for several bottleneck problems.
信息瓶颈(IB)和隐私漏斗(PF)是两个密切相关的优化问题,它们在机器学习、隐私算法设计、容量问题(如格伯夫人引理)以及强数据处理不等式等方面都有应用。在这项工作中,我们首先通过一个统一的理论框架研究IB和PF的函数性质。然后,我们将它们与三个信息论编码问题联系起来,即针对独立性的假设检验、有噪源编码和相关性稀释。利用这些联系,我们证明了IB中辅助变量的一个新的基数界,使得其对于离散随机变量的计算更易于处理。在第二部分,我们通过用其他互信息概念,即 - 信息和有元互信息,替换IB和PF中的互信息,引入了一类称为“瓶颈问题”的一般优化问题。然后我们认为,与IB和PF不同,这些问题在各种对准确性和隐私有统计约束的推理任务中能给出易于解释的保证。虽然底层的优化问题是非凸的,但我们开发了一种技术,通过将它们等效地表示为某些函数的下凸包络或上凹包络,以封闭形式评估瓶颈问题。通过将该技术应用于二元情况,我们推导出了几个瓶颈问题的封闭形式表达式。