Zhang Jialin, Shi Jingyi
Department of Mathematics and Statistics, Mississippi State University, Mississippi State, MS 39762, USA.
Entropy (Basel). 2022 May 12;24(5):683. doi: 10.3390/e24050683.
Shannon's entropy is one of the building blocks of information theory and an essential aspect of Machine Learning (ML) methods (e.g., Random Forests). Yet, it is only finitely defined for distributions with fast decaying tails on a countable alphabet. The unboundedness of Shannon's entropy over the general class of all distributions on an alphabet prevents its potential utility from being fully realized. To fill the void in the foundation of information theory, Zhang (2020) proposed generalized Shannon's entropy, which is finitely defined everywhere. The plug-in estimator, adopted in almost all entropy-based ML method packages, is one of the most popular approaches to estimating Shannon's entropy. The asymptotic distribution for Shannon's entropy's plug-in estimator was well studied in the existing literature. This paper studies the asymptotic properties for the plug-in estimator of generalized Shannon's entropy on countable alphabets. The developed asymptotic properties require no assumptions on the original distribution. The proposed asymptotic properties allow for interval estimation and statistical tests with generalized Shannon's entropy.
香农熵是信息论的基石之一,也是机器学习(ML)方法(如随机森林)的一个重要方面。然而,它仅对可数字母表上具有快速衰减尾部的分布进行了有限定义。香农熵在字母表上所有分布的一般类别上的无界性阻碍了其潜在效用的充分实现。为了填补信息论基础中的空白,张(2020)提出了广义香农熵,它在任何地方都有有限定义。几乎所有基于熵的ML方法包中采用的插件估计器是估计香农熵最流行的方法之一。现有文献对香农熵插件估计器的渐近分布进行了深入研究。本文研究了可数字母表上广义香农熵插件估计器的渐近性质。所推导的渐近性质不需要对原始分布做任何假设。所提出的渐近性质允许对广义香农熵进行区间估计和统计检验。