基于类信息瓶颈方法的神经网络拟合、压缩与泛化行为研究

On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches.

作者信息

Lyu Zhaoyan, Aminian Gholamali, Rodrigues Miguel R D

机构信息

Department of Electronic and Electrical Engineering, University College London, Gower St., London WC1E 6BT, UK.

The Alan Turing Institute, British Library, 96 Euston Rd., London NW1 2DB, UK.

出版信息

Entropy (Basel). 2023 Jul 14;25(7):1063. doi: 10.3390/e25071063.

DOI:10.3390/e25071063

PMID:37510010

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10377965/

Abstract

It is well-known that a neural network learning process-along with its connections to fitting, compression, and generalization-is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases.

摘要

众所周知，神经网络的学习过程及其与拟合、压缩和泛化的联系尚未得到很好的理解。在本文中，我们提出了一种新颖的方法，使用信息瓶颈型技术来捕捉这种神经网络动态，该方法涉及用其他更易于处理的量来替代互信息度量（在高维空间中估计互信息度量非常困难），这些量包括：（1）与从某些中间网络表示重建网络输入数据相关的最小均方误差，以及（2）给定某些网络表示时与某个类别标签相关的交叉熵。然后，我们进行了一项实证研究，以确定不同的网络模型、网络学习算法和数据集如何影响学习动态。我们的实验表明，与经典的信息瓶颈方法相比，我们提出的方法在训练和测试阶段捕捉网络动态时似乎更可靠。我们的实验还揭示，无论激活函数如何选择，拟合和压缩阶段都存在。此外，我们的研究结果表明，导致更好泛化的模型架构、训练算法和数据集往往表现出更明显的拟合和压缩阶段。