Machine Learning Group - ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium.
Neural Netw. 2013 Dec;48:1-7. doi: 10.1016/j.neunet.2013.07.003. Epub 2013 Jul 11.
Feature selection is an important preprocessing step for many high-dimensional regression problems. One of the most common strategies is to select a relevant feature subset based on the mutual information criterion. However, no connection has been established yet between the use of mutual information and a regression error criterion in the machine learning literature. This is obviously an important lack, since minimising such a criterion is eventually the objective one is interested in. This paper demonstrates that under some reasonable assumptions, features selected with the mutual information criterion are the ones minimising the mean squared error and the mean absolute error. On the contrary, it is also shown that the mutual information criterion can fail in selecting optimal features in some situations that we characterise. The theoretical developments presented in this work are expected to lead in practice to a critical and efficient use of the mutual information for feature selection.
特征选择是许多高维回归问题的重要预处理步骤。最常见的策略之一是根据互信息准则选择相关的特征子集。然而,在机器学习文献中,还没有建立使用互信息和回归误差准则之间的联系。这显然是一个重要的缺陷,因为最小化这样的准则最终是人们感兴趣的目标。本文证明,在一些合理的假设下,基于互信息准则选择的特征是最小化均方误差和平均绝对误差的特征。相反,也表明在我们所描述的某些情况下,互信息准则可能无法选择最优特征。这项工作提出的理论发展有望在实践中导致对互信息进行关键和有效的特征选择。