Suppr超能文献

一种用于分类中特征选择的新型混合过滤/包装算法。

A new hybrid filter/wrapper algorithm for feature selection in classification.

作者信息

Zhang Jixiong, Xiong Yanmei, Min Shungeng

机构信息

College of Science, China Agricultural University, Beijing, 100193, PR China.

College of Science, China Agricultural University, Beijing, 100193, PR China.

出版信息

Anal Chim Acta. 2019 Nov 8;1080:43-54. doi: 10.1016/j.aca.2019.06.054. Epub 2019 Jun 28.

Abstract

Feature selection can greatly enhance the performance of a learning algorithm when dealing with a high dimensional data set. The filter method and the wrapper method are the two most commonly approaches. However, these approaches have limitations. The filter method uses independent evaluation to evaluate and select features, which is computationally efficient but less accurate than the wrapper method. The wrapper method uses a predetermined classifier to compute the evaluation, which can afford high accuracy for particular classifiers, but is computationally expensive. In this study, we introduce a new feature selection method that we refer to as the large margin hybrid algorithm for feature selection (LMFS). In this method, we first utilize a new distance-based evaluation function, in which ideally samples from the same class are close together, whereas samples from other classes are far apart, and a weighted bootstrapping search strategy to find a set of candidate feature subsets. Then, we use a specific classifier and cross-validation to select the final feature subset from the candidate feature subsets. Six vibrational spectroscopic data sets and three different classifiers, namely k-nearest neighbors, partial least squares discriminant analysis and least squares support vector machine were used to validate the performance of the LMFS method. The results revealed that LMFS can effectively overcome the over-fitting between the optimal feature subset and a given classifier. Compared with the filter and wrapper methods, the features selected by the LMFS method have better classification performance and model interpretation. Furthermore, LMFS can effectively overcomes the impact of classifier complexity on computational time, and distance-based classifiers were found to be more suitable for selecting the final subset in LMFS.

摘要

在处理高维数据集时,特征选择可以极大地提高学习算法的性能。过滤法和包装法是两种最常用的方法。然而,这些方法存在局限性。过滤法使用独立评估来评估和选择特征,计算效率高,但比包装法准确性低。包装法使用预定的分类器来计算评估,对于特定分类器可以提供高精度,但计算成本高。在本研究中,我们引入了一种新的特征选择方法,我们称之为用于特征选择的大间隔混合算法(LMFS)。在这种方法中,我们首先利用一种基于距离的新评估函数,理想情况下,同一类别的样本靠得很近,而其他类别的样本相距很远,以及一种加权自助搜索策略来找到一组候选特征子集。然后,我们使用特定的分类器和交叉验证从候选特征子集中选择最终的特征子集。使用六个振动光谱数据集和三种不同的分类器,即k近邻、偏最小二乘判别分析和最小二乘支持向量机,来验证LMFS方法的性能。结果表明,LMFS可以有效克服最优特征子集与给定分类器之间的过拟合。与过滤法和包装法相比,LMFS方法选择的特征具有更好的分类性能和模型解释性。此外,LMFS可以有效克服分类器复杂度对计算时间的影响,并且发现基于距离的分类器更适合在LMFS中选择最终子集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验