Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, China.
IEEE Trans Image Process. 2012 May;21(5):2379-88. doi: 10.1109/TIP.2012.2183879. Epub 2012 Jan 12.
The goal of feature selection is to identify the most informative features for compact representation, whereas the goal of active learning is to select the most informative instances for prediction. Previous studies separately address these two problems, despite of the fact that selecting features and instances are dual operations over a data matrix. In this paper, we consider the novel problem of simultaneously selecting the most informative features and instances and develop a solution from the perspective of optimum experimental design. That is, by using the selected features as the new representation and the selected instances as training data, the variance of the parameter estimate of a learning function can be minimized. Specifically, we propose a novel approach, which is called Unified criterion for Feature and Instance selection (UFI), to simultaneously identify the most informative features and instances that minimize the trace of the parameter covariance matrix. A greedy algorithm is introduced to efficiently solve the optimization problem. Experimental results on two benchmark data sets demonstrate the effectiveness of our proposed method.
特征选择的目标是识别最具信息量的特征,以便进行紧凑表示,而主动学习的目标是选择最具信息量的实例进行预测。尽管选择特征和实例是对数据矩阵的双重操作,但之前的研究分别解决了这两个问题。在本文中,我们考虑了同时选择最具信息量的特征和实例的新问题,并从最佳实验设计的角度提出了一个解决方案。也就是说,通过使用所选特征作为新的表示,以及所选实例作为训练数据,可以最小化学习函数的参数估计的方差。具体来说,我们提出了一种新的方法,称为特征和实例选择的统一准则(UFI),以同时识别出信息量最大的特征和实例,从而最小化参数协方差矩阵的迹。引入了一种贪婪算法来有效地解决优化问题。在两个基准数据集上的实验结果证明了我们提出的方法的有效性。