Lu Chenguang
Intelligence Engineering and Mathematics Institute, Liaoning Technical University, Fuxin 123000, China.
School of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410022, China.
Entropy (Basel). 2023 May 15;25(5):802. doi: 10.3390/e25050802.
A new trend in deep learning, represented by Mutual Information Neural Estimation (MINE) and Information Noise Contrast Estimation (InfoNCE), is emerging. In this trend, similarity functions and Estimated Mutual Information (EMI) are used as learning and objective functions. Coincidentally, EMI is essentially the same as Semantic Mutual Information (SeMI) proposed by the author 30 years ago. This paper first reviews the evolutionary histories of semantic information measures and learning functions. Then, it briefly introduces the author's semantic information G theory with the rate-fidelity function () ( denotes SeMI, and () extends ()) and its applications to multi-label learning, the maximum Mutual Information (MI) classification, and mixture models. Then it discusses how we should understand the relationship between SeMI and Shannon's MI, two generalized entropies (fuzzy entropy and coverage entropy), Autoencoders, Gibbs distributions, and partition functions from the perspective of the () function or the G theory. An important conclusion is that mixture models and Restricted Boltzmann Machines converge because SeMI is maximized, and Shannon's MI is minimized, making information efficiency / close to 1. A potential opportunity is to simplify deep learning by using Gaussian channel mixture models for pre-training deep neural networks' latent layers without considering gradients. It also discusses how the SeMI measure is used as the reward function (reflecting purposiveness) for reinforcement learning. The G theory helps interpret deep learning but is far from enough. Combining semantic information theory and deep learning will accelerate their development.
一种以互信息神经估计(MINE)和信息噪声对比估计(InfoNCE)为代表的深度学习新趋势正在兴起。在这一趋势中,相似性函数和估计互信息(EMI)被用作学习和目标函数。巧合的是,EMI本质上与作者30年前提出的语义互信息(SeMI)相同。本文首先回顾了语义信息度量和学习函数的发展历程。然后,简要介绍了作者的语义信息G理论及其速率保真函数()(表示SeMI,()扩展了())及其在多标签学习、最大互信息(MI)分类和混合模型中的应用。接着讨论了如何从()函数或G理论的角度理解SeMI与香农互信息、两种广义熵(模糊熵和覆盖熵)、自动编码器、吉布斯分布以及配分函数之间的关系。一个重要结论是,混合模型和受限玻尔兹曼机收敛是因为SeMI最大化,而香农互信息最小化,使得信息效率/接近1。一个潜在的机会是通过使用高斯信道混合模型对深度神经网络的潜在层进行预训练来简化深度学习,而无需考虑梯度。还讨论了SeMI度量如何用作强化学习的奖励函数(反映目的性)。G理论有助于解释深度学习,但还远远不够。将语义信息理论与深度学习相结合将加速它们的发展。