基于收敛结果的无穷字母表中香农熵估计：研究插入式估计器

Shannon Entropy Estimation in ∞-Alphabets from Convergence Results: Studying Plug-In Estimators.

作者信息

Silva Jorge F

机构信息

Information and Decision System Group, Department of Electrical Engineering, Universidad de Chile, Av. Tupper 2007, Santiago 7591538, Chile.

出版信息

Entropy (Basel). 2018 May 23;20(6):397. doi: 10.3390/e20060397.

DOI:10.3390/e20060397

PMID:33265487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7512916/

Abstract

This work addresses the problem of Shannon entropy estimation in countably infinite alphabets studying and adopting some recent convergence results of the entropy functional, which is known to be a discontinuous function in the space of probabilities in ∞-alphabets. Sufficient conditions for the convergence of the entropy are used in conjunction with some deviation inequalities (including scenarios with both finitely and infinitely supported assumptions on the target distribution). From this perspective, four plug-in histogram-based estimators are studied showing that convergence results are instrumental to derive new strong consistent estimators for the entropy. The main application of this methodology is a new data-driven partition (plug-in) estimator. This scheme uses the data to restrict the support where the distribution is estimated by finding an optimal balance between estimation and approximation errors. The proposed scheme offers a consistent (distribution-free) estimator of the entropy in ∞-alphabets and optimal rates of convergence under certain regularity conditions on the problem (finite and unknown supported assumptions and tail bounded conditions on the target distribution).

摘要

这项工作研究并采用了熵泛函的一些最新收敛结果，解决了可数无限字母表中香农熵估计的问题，已知该熵泛函在无穷字母表的概率空间中是一个不连续函数。熵收敛的充分条件与一些偏差不等式（包括对目标分布有限支持和无限支持假设的情况）结合使用。从这个角度出发，研究了四种基于直方图的插件估计器，表明收敛结果有助于推导新的熵强一致估计器。该方法的主要应用是一种新的数据驱动划分（插件）估计器。该方案利用数据来限制估计分布的支持域，通过在估计误差和近似误差之间找到最佳平衡。所提出的方案在无穷字母表中提供了一个一致的（无分布）熵估计器，并在问题的某些正则性条件下（有限且未知的支持假设以及目标分布的尾部有界条件）给出了最优收敛速率。