Suppr超能文献

PSO 中基于离散化的特征选择的新表示。

A New Representation in PSO for Discretization-Based Feature Selection.

出版信息

IEEE Trans Cybern. 2018 Jun;48(6):1733-1746. doi: 10.1109/TCYB.2017.2714145. Epub 2017 Jun 23.

Abstract

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity.

摘要

在机器学习中,离散化和特征选择(FS)是预处理数据以提高算法在高维数据上性能的重要技术。由于许多 FS 方法需要离散数据,因此通常在 FS 之前应用离散化。此外,为了提高效率,特征通常是单独(或单变量)离散化的。这种方案基于每个特征独立影响任务的假设,而在存在特征交互的情况下,这种假设可能不成立。因此,由于在离散化过程中可能会丢失显示特征交互的信息,因此单变量离散化可能会降低 FS 阶段的性能。我们之前提出的方法[进化粒子群优化(EPSO)]的初步结果表明,使用基本粒子群优化(BBPSO)在单个阶段中结合离散化和 FS 可以比在两个单独阶段中应用它们获得更好的性能。在本文中,我们提出了一种新方法,称为潜在粒子群优化(PPSO),它采用了一种新的表示形式,可以减少问题的搜索空间,并采用了新的适应度函数来更好地评估候选解决方案,以指导搜索。在十个高维数据集上的结果表明,PPSO 为所有数据集选择的特征数量不到 5%。与使用 BBPSO 在离散化数据上进行 FS 的两阶段方法相比,PPSO 在七个数据集上实现了显著更高的准确性。此外,PPSO 在八个数据集上获得了比 EPSO 更好(或相似)的分类性能,在六个数据集上选择的特征数量更少。此外,PPSO 在大多数数据集上的泛化能力和学习能力方面也优于三种比较方法(传统方法),并且与一种方法的性能相似。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验