Suppr超能文献

一种用于高维混合决策表的新型特征选择方法。

A Novel Feature Selection Method for High-Dimensional Mixed Decision Tables.

作者信息

Thuy Nguyen Ngoc, Wongthanavasu Sartra

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):3024-3037. doi: 10.1109/TNNLS.2020.3048080. Epub 2022 Jul 6.

Abstract

Attribute reduction, also called feature selection, is one of the most important issues of rough set theory, which is regarded as a vital preprocessing step in pattern recognition, machine learning, and data mining. Nowadays, high-dimensional mixed and incomplete data sets are very common in real-world applications. Certainly, the selection of a promising feature subset from such data sets is a very interesting, but challenging problem. Almost all of the existing methods generated a cover on the space of objects to determine important features. However, some tolerance classes in the cover are useless for the computational process. Thus, this article introduces a new concept of stripped neighborhood covers to reduce unnecessary tolerance classes from the original cover. Based on the proposed stripped neighborhood cover, we define a new reduct in mixed and incomplete decision tables, and then design an efficient heuristic algorithm to find this reduct. For each loop in the main loop of the proposed algorithm, we use an error measure to select an optimal feature and put it into the selected feature subset. Besides, to deal more efficiently with high-dimensional data sets, we also determine redundant features after each loop and remove them from the candidate feature subset. For the purpose of verifying the performance of the proposed algorithm, we carry out experiments on data sets downloaded from public data sources to compare with existing state-of-the-art algorithms. Experimental results showed that our algorithm outperforms compared algorithms, especially in classification accuracy.

摘要

属性约简,也称为特征选择,是粗糙集理论中最重要的问题之一,它被视为模式识别、机器学习和数据挖掘中至关重要的预处理步骤。如今,高维混合且不完整的数据集在实际应用中非常常见。当然,从这类数据集中选择一个有前景的特征子集是一个非常有趣但具有挑战性的问题。几乎所有现有的方法都在对象空间上生成一个覆盖来确定重要特征。然而,覆盖中的一些容差类在计算过程中是无用的。因此,本文引入了一种新的剥离邻域覆盖概念,以从原始覆盖中减少不必要的容差类。基于所提出的剥离邻域覆盖,我们在混合且不完整的决策表中定义了一种新的约简,然后设计了一种高效的启发式算法来找到这种约简。对于所提出算法主循环中的每一轮循环,我们使用一种误差度量来选择一个最优特征并将其放入所选特征子集中。此外,为了更有效地处理高维数据集,我们还在每一轮循环后确定冗余特征,并将它们从候选特征子集中移除。为了验证所提出算法的性能,我们对从公共数据源下载的数据集进行实验,以与现有的最先进算法进行比较。实验结果表明,我们的算法优于比较算法,尤其是在分类准确率方面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验