Mielniczuk Jan
Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248 Warsaw, Poland.
Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland.
Entropy (Basel). 2022 Aug 4;24(8):1079. doi: 10.3390/e24081079.
We review the principal information theoretic tools and their use for feature selection, with the main emphasis on classification problems with discrete features. Since it is known that empirical versions of conditional mutual information perform poorly for high-dimensional problems, we focus on various ways of constructing its counterparts and the properties and limitations of such methods. We present a unified way of constructing such measures based on truncation, or truncation and weighing, for the Möbius expansion of conditional mutual information. We also discuss the main approaches to feature selection which apply the introduced measures of conditional dependence, together with the ways of assessing the quality of the obtained vector of predictors. This involves discussion of recent results on asymptotic distributions of empirical counterparts of criteria, as well as advances in resampling.
我们回顾了主要的信息论工具及其在特征选择中的应用,主要侧重于具有离散特征的分类问题。由于已知条件互信息的经验版本在高维问题上表现不佳,我们重点关注构建其对应物的各种方法以及此类方法的性质和局限性。我们提出了一种基于截断或截断与加权的统一方法,用于条件互信息的莫比乌斯展开。我们还讨论了应用引入的条件依赖度量的主要特征选择方法,以及评估所得预测变量向量质量的方法。这涉及对准则经验对应物的渐近分布的最新结果的讨论,以及重采样方面的进展。