School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China.
IEEE Trans Pattern Anal Mach Intell. 2013 May;35(5):1178-92. doi: 10.1109/TPAMI.2012.197.
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
我们提出了一种新的在线特征选择框架,用于具有流特征的应用,其中事先不知道完整的特征空间的知识。我们将流特征定义为随时间逐个流动的特征,而训练示例的数量保持不变。这与传统的在线学习方法形成对比,传统的在线学习方法仅处理顺序添加的观察结果,对流特征的关注很少。在线流特征选择(OSFS)的关键挑战包括 1)特征数量随时间不断增长,2)特征空间大,可能未知或无限大,以及 3)在学习开始之前无法获得整个特征集。在本文中,我们提出了一种新颖的在线流特征选择方法,以实时选择强相关且非冗余的特征。提出了一种有效的快速 OSFS 算法来提高特征选择性能。在高维数据集上以及在冲击坑检测的实际案例研究上,对所提出的算法进行了广泛的评估。实验结果表明,与现有的流特征选择算法相比,这些算法实现了更好的紧凑性和更高的预测准确性。