Molavipour Sina, Ghourchian Hamid, Bassi Germán, Skoglund Mikael
School of Electrical Engineering and Computer Science (EECS), KTH Royal Institute of Technology, 100 44 Stockholm, Sweden.
Ericsson Research, 164 83 Stockholm, Sweden.
Entropy (Basel). 2021 May 21;23(6):641. doi: 10.3390/e23060641.
Novel approaches to estimate information measures using neural networks are well-celebrated in recent years both in the information theory and machine learning communities. These neural-based estimators are shown to converge to the true values when estimating mutual information and conditional mutual information using independent samples. However, if the samples in the dataset are not independent, the consistency of these estimators requires further investigation. This is of particular interest for a more complex measure such as the directed information, which is pivotal in characterizing causality and is meaningful over time-dependent variables. The extension of the convergence proof for such cases is not trivial and demands further assumptions on the data. In this paper, we show that our neural estimator for conditional mutual information is consistent when the dataset is generated with samples of a stationary and ergodic source. In other words, we show that our information estimator using neural networks converges asymptotically to the true value with probability one. Besides universal functional approximation of neural networks, a core lemma to show the convergence is Birkhoff's ergodic theorem. Additionally, we use the technique to estimate directed information and demonstrate the effectiveness of our approach in simulations.
近年来,在信息论和机器学习领域,利用神经网络估计信息度量的新方法备受赞誉。当使用独立样本估计互信息和条件互信息时,这些基于神经网络的估计器被证明能收敛到真实值。然而,如果数据集中的样本不是独立的,这些估计器的一致性需要进一步研究。对于更复杂的度量,如有向信息,这一点尤为重要,有向信息在表征因果关系方面至关重要,并且对于随时间变化的变量具有意义。此类情况下收敛性证明的扩展并非易事,需要对数据做出进一步假设。在本文中,我们表明,当数据集由平稳遍历源的样本生成时,我们的条件互信息神经估计器是一致的。换句话说,我们表明我们使用神经网络的信息估计器以概率 1 渐近收敛到真实值。除了神经网络的通用函数逼近之外,证明收敛性的一个核心引理是伯克霍夫遍历定理。此外,我们使用该技术估计有向信息,并在仿真中证明了我们方法的有效性。