Ditzler Gregory, LaBarck Joseph, Ritchie James, Rosen Gail, Polikar Robi
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4504-4509. doi: 10.1109/TNNLS.2017.2746107. Epub 2017 Oct 11.
Feature subset selection can be used to sieve through large volumes of data and discover the most informative subset of variables for a particular learning problem. Yet, due to memory and other resource constraints (e.g., CPU availability), many of the state-of-the-art feature subset selection methods cannot be extended to high dimensional data, or data sets with an extremely large volume of instances. In this brief, we extend online feature selection (OFS), a recently introduced approach that uses partial feature information, by developing an ensemble of online linear models to make predictions. The OFS approach employs a linear model as the base classifier, which allows the $l_{0}$ -norm of the parameter vector to be constrained to perform feature selection leading to sparse linear models. We demonstrate that the proposed ensemble model typically yields a smaller error rate than any single linear model, while maintaining the same level of sparsity and complexity at the time of testing.
特征子集选择可用于筛选大量数据,并为特定的学习问题发现信息量最大的变量子集。然而,由于内存和其他资源限制(例如CPU可用性),许多最先进的特征子集选择方法无法扩展到高维数据或具有极大量实例的数据集。在本简报中,我们通过开发在线线性模型集成来进行预测,扩展了在线特征选择(OFS),这是一种最近引入的使用部分特征信息的方法。OFS方法采用线性模型作为基础分类器,它允许对参数向量的$l_{0}$范数进行约束,以执行特征选择,从而得到稀疏线性模型。我们证明,所提出的集成模型通常比任何单个线性模型产生更小的错误率,同时在测试时保持相同水平的稀疏性和复杂度。