Department of Pattern Recognition, Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague, Czech Republic.
IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):1921-39. doi: 10.1109/TPAMI.2010.34.
Stability (robustness) of feature selection methods is a topic of recent interest, yet often neglected importance, with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in the form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown to be usable for assessing feature selection methods (or criteria) as such.
特征选择方法的稳定性(鲁棒性)是最近备受关注的一个话题,但往往被忽视其重要性,因为它直接影响到机器学习系统的可靠性。我们研究了评估产生不同大小子集的特征选择过程稳定性的问题。我们引入了几种新的特征选择稳定性度量方法,并在一个统一的框架中调整了一些现有的度量方法,从而为稳定性问题提供了广泛的见解。我们详细研究了所考虑的度量方法的性质,并在各种示例中演示了可以从特征选择过程中获得哪些信息。我们还引入了一种以能够比较两个特征选择过程相似性的度量方法的形式来替代特征选择评估的方法。这些度量方法可以用于比较例如两种特征选择方法的输出,或者具有不同参数的一种方法的两次运行。使用所考虑的稳定性和相似性度量方法获得的信息可用于评估特征选择方法(或标准)本身。