Peng Hanyang, Fan Yong
College of Computer Science and Software Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, Guangdong, 518060, PR China.
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences, 100190, Beijing, PR China.
Inf Sci (N Y). 2017 Dec;418-419:652-667. doi: 10.1016/j.ins.2017.08.036. Epub 2017 Aug 9.
A unified framework is proposed to select features by optimizing computationally feasible approximations of high-dimensional conditional mutual information (CMI) between features and their associated class label under different assumptions. Under this unified framework, state-of-the-art information theory based feature selection algorithms are rederived, and a new algorithm is proposed to select features by optimizing a lower bound of the CMI with a weaker assumption than those adopted by existing methods. The new feature selection method integrates a plug-in component to distinguish redundant features from irrelevant ones for improving the feature selection robustness. Furthermore, a novel metric is proposed to evaluate feature selection methods based on simulated data. The proposed method has been compared with state-of-the-art feature selection methods based on the new evaluation metric and classification performance of classifiers built upon the selected features. The experiment results have demonstrated that the proposed method could achieve promising performance in a variety of feature selection problems.
提出了一个统一框架,用于在不同假设下通过优化特征与其相关类标签之间高维条件互信息(CMI)的计算可行近似值来选择特征。在此统一框架下,重新推导了基于信息论的先进特征选择算法,并提出了一种新算法,该算法通过优化CMI的下限来选择特征,其假设比现有方法所采用的假设更弱。新的特征选择方法集成了一个插件组件,以区分冗余特征和无关特征,从而提高特征选择的鲁棒性。此外,还提出了一种新颖的指标,用于基于模拟数据评估特征选择方法。基于新的评估指标以及基于所选特征构建的分类器的分类性能,将所提出的方法与基于信息论的先进特征选择方法进行了比较。实验结果表明,所提出的方法在各种特征选择问题中都能取得良好的性能。