Suppr超能文献

基于序贯学习的过滤式特征子集选择方法的扩展。

A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2530-2544. doi: 10.1109/TNNLS.2017.2697407. Epub 2017 May 11.

Abstract

Increasingly, many machine learning applications are now associated with very large data sets whose sizes were almost unimaginable just a short time ago. As a result, many of the current algorithms cannot handle, or do not scale to, today's extremely large volumes of data. Fortunately, not all features that make up a typical data set carry information that is relevant or useful for prediction, and identifying and removing such irrelevant features can significantly reduce the total data size. The unfortunate dilemma, however, is that some of the current data sets are so large that common feature selection algorithms-whose very goal is to reduce the dimensionality-cannot handle such large data sets, creating a vicious cycle. We describe a sequential learning framework for feature subset selection (SLSS) that can scale with both the number of features and the number of observations. The proposed framework uses multiarm bandit algorithms to sequentially search a subset of variables, and assign a level of importance for each feature. The novel contribution of SLSS is its ability to naturally scale to large data sets, evaluate such data in a very small amount of time, and be performed independently of the optimization of any classifier to reduce unnecessary complexity. We demonstrate the capabilities of SLSS on synthetic and real-world data sets.

摘要

如今,越来越多的机器学习应用程序都与非常庞大的数据集相关联,而这些数据集的规模在不久前几乎是难以想象的。因此,许多现有的算法无法处理或无法扩展到当今如此庞大的数据量。幸运的是,构成典型数据集的并非所有特征都包含对预测有用或相关的信息,识别和删除这些不相关的特征可以显著减少总数据量。然而,不幸的是,一些现有的数据集非常庞大,以至于常见的特征选择算法——其目标就是降低维度——无法处理如此庞大的数据集,从而形成了一个恶性循环。我们描述了一种用于特征子集选择(SLSS)的顺序学习框架,该框架可以与特征数量和观测数量同时扩展。所提出的框架使用多臂老虎机算法来顺序搜索变量子集,并为每个特征分配一个重要性级别。SLSS 的新颖贡献在于它能够自然地扩展到大型数据集,在非常短的时间内评估此类数据,并独立于任何分类器的优化来执行,从而减少不必要的复杂性。我们在合成数据集和真实数据集上展示了 SLSS 的能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验