Lee Sungyeop, Jo Junghyo
Department of Physics and Astronomy, Seoul National University, Seoul 08826, Korea.
Department of Physics Education and Center for Theoretical Physics and Artificial Intelligence Institute, Seoul National University, Seoul 08826, Korea.
Entropy (Basel). 2021 Jul 5;23(7):862. doi: 10.3390/e23070862.
Deep learning methods have had outstanding performances in various fields. A fundamental query is why they are so effective. Information theory provides a potential answer by interpreting the learning process as the information transmission and compression of data. The information flows can be visualized on the information plane of the mutual information among the input, hidden, and output layers. In this study, we examine how the information flows are shaped by the network parameters, such as depth, sparsity, weight constraints, and hidden representations. Here, we adopt autoencoders as models of deep learning, because (i) they have clear guidelines for their information flows, and (ii) they have various species, such as vanilla, sparse, tied, variational, and label autoencoders. We measured their information flows using Rényi's matrix-based α-order entropy functional. As learning progresses, they show a typical fitting phase where the amounts of input-to-hidden and hidden-to-output mutual information both increase. In the last stage of learning, however, some autoencoders show a simplifying phase, previously called the "compression phase", where input-to-hidden mutual information diminishes. In particular, the sparsity regularization of hidden activities amplifies the simplifying phase. However, tied, variational, and label autoencoders do not have a simplifying phase. Nevertheless, all autoencoders have similar reconstruction errors for training and test data. Thus, the simplifying phase does not seem to be necessary for the generalization of learning.
深度学习方法在各个领域都有着出色的表现。一个基本的问题是它们为何如此有效。信息论通过将学习过程解释为数据的信息传输和压缩提供了一个可能的答案。信息流可以在输入层、隐藏层和输出层之间的互信息的信息平面上可视化。在本研究中,我们研究了信息流是如何由网络参数塑造的,如深度、稀疏性、权重约束和隐藏表示。在这里,我们采用自动编码器作为深度学习模型,因为(i)它们的信息流有明确的指导方针,并且(ii)它们有各种类型,如普通型、稀疏型、绑定型、变分型和标签自动编码器。我们使用基于雷尼矩阵的α阶熵泛函来测量它们的信息流。随着学习的进行,它们呈现出一个典型的拟合阶段,其中输入到隐藏和隐藏到输出的互信息都增加。然而,在学习的最后阶段,一些自动编码器呈现出一个简化阶段,以前称为“压缩阶段”,其中输入到隐藏的互信息减少。特别是,隐藏活动的稀疏正则化放大了简化阶段。然而,绑定型、变分型和标签自动编码器没有简化阶段。尽管如此,所有自动编码器对于训练和测试数据都有相似的重建误差。因此,简化阶段似乎对于学习的泛化不是必需的。