Li Bing, Chow Tommy W S, Huang Di
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chu Avenue, Kowloon, Hong Kong.
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
J Intell Inf Syst. 2013 Oct 1;41(2):235-268. doi: 10.1007/s10844-013-0243-x.
In this paper, a novel feature selection method based on rough sets and mutual information is proposed. The dependency of each feature guides the selection, and mutual information is employed to reduce the features which do not favor addition of dependency significantly. So the dependency of the subset found by our method reaches maximum with small number of features. Since our method evaluates both definitive relevance and uncertain relevance by a combined selection criterion of dependency and class-based distance metric, the feature subset is more relevant than other rough sets based methods. As a result, the subset is near optimal solution. In order to verify the contribution, eight different classification applications are employed. Our method is also employed on a real Alzheimer's disease dataset, and finds a feature subset where classification accuracy arrives at 81.3%. Those present results verify the contribution of our method.
本文提出了一种基于粗糙集和互信息的新型特征选择方法。每个特征的依赖性指导选择过程,互信息用于减少对依赖性增加没有显著贡献的特征。因此,我们的方法找到的子集在特征数量较少的情况下依赖性达到最大。由于我们的方法通过依赖性和基于类的距离度量的组合选择标准来评估确定性相关性和不确定性相关性,所以该特征子集比其他基于粗糙集的方法更相关。结果,该子集接近最优解。为了验证其贡献,我们使用了八个不同的分类应用。我们的方法还应用于一个真实的阿尔茨海默病数据集,并找到了一个分类准确率达到81.3%的特征子集。这些结果验证了我们方法的贡献。