Suppr超能文献

一种基于随机森林和改进遗传算法的新型两阶段特征选择方法,用于增强机器学习中的分类。

A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning.

作者信息

Ding Junyao, Du Jianchao, Wang Hejie, Xiao Song

机构信息

School of Telecommunications Engineering, Xidian University, Xi'an, 710071, China.

Beijing Electronic Science and Technology Institute, Beijing, 100070, China.

出版信息

Sci Rep. 2025 May 14;15(1):16828. doi: 10.1038/s41598-025-01761-1.

Abstract

The data acquisition methods are becoming increasingly diverse and advanced, leading to higher data dimensions, blurred classification boundaries, and overfitting datasets, affecting machine learning models' accuracy. Many studies have sought to improve model performance through feature selection. However, a single feature selection method has incomplete, unstable, or time-consuming shortcomings. Combining the advantages of various feature selection methods can help overcome these defects. This paper proposes a two-stage feature selection method based on random forest and improved genetic algorithm. First, the importance scores of the random forest are calculated and ranked, and the features are preliminarily eliminated according to the scores, reducing the time complexity of the subsequent process. Then, the improved genetic algorithm is used to search for the global optimal feature subset further. This process introduces a multi-objective fitness function to guide the feature subset, minimizing the number of features in the subset while enhancing classification accuracy. This paper also adds an adaptive mechanism and evolution strategy to improve the loss of population diversity and degeneration in the later stages of iteration, thereby enhancing search efficiency. The experimental results on eight UCI datasets show that the proposed method significantly improves classification performance and has excellent feature selection capability.

摘要

数据采集方法日益多样化和先进,导致数据维度更高、分类边界模糊以及数据集过拟合,影响机器学习模型的准确性。许多研究试图通过特征选择来提高模型性能。然而,单一的特征选择方法存在不完整、不稳定或耗时的缺点。结合各种特征选择方法的优点有助于克服这些缺陷。本文提出了一种基于随机森林和改进遗传算法的两阶段特征选择方法。首先,计算并排列随机森林的重要性得分,根据得分初步消除特征,降低后续过程的时间复杂度。然后,使用改进的遗传算法进一步搜索全局最优特征子集。此过程引入多目标适应度函数来指导特征子集,在增加分类准确率的同时最小化子集中的特征数量。本文还添加了自适应机制和进化策略,以改善迭代后期种群多样性的损失和退化,从而提高搜索效率。在八个UCI数据集上的实验结果表明,所提出的方法显著提高了分类性能,并具有出色的特征选择能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4c0/12078713/f3c97003dd15/41598_2025_1761_Figa_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验