Suppr超能文献

基于大数据技术的高维小样本数据特征选择与特征稳定性测量方法。

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology.

机构信息

School of Electricity and New Energy, China Three Gorges University, Yichang 443002, China.

出版信息

Comput Intell Neurosci. 2021 Sep 23;2021:3597051. doi: 10.1155/2021/3597051. eCollection 2021.

Abstract

With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more complex, and the data dimensions of mining objects have become higher and higher. Compared with the ultra-high data dimensions, the number of samples available for analysis is too small, resulting in the production of high-dimensional small sample data. High-dimensional small sample data will bring serious dimensional disasters to the mining process. Through feature selection, redundancy and noise features in high-dimensional small sample data can be effectively eliminated, avoiding dimensional disasters and improving the actual efficiency of mining algorithms. However, the existing feature selection methods emphasize the classification or clustering performance of the feature selection results and ignore the stability of the feature selection results, which will lead to unstable feature selection results, and it is difficult to obtain real and understandable features. Based on the traditional feature selection method, this paper proposes an ensemble feature selection method, Random Bits Forest Recursive Clustering Eliminate (RBF-RCE) feature selection method, combined with multiple sets of basic classifiers to carry out parallel learning and screen out the best feature classification results, optimizes the classification performance of traditional feature selection methods, and can also improve the stability of feature selection. Then, this paper analyzes the reasons for the instability of feature selection and introduces a feature selection stability measurement method, the Intersection Measurement (IM), to evaluate whether the feature selection process is stable. The effectiveness of the proposed method is verified by experiments on several groups of high-dimensional small sample data sets.

摘要

近年来,随着人工智能的快速发展,图像处理、文本挖掘和基因组信息学的研究逐渐深入,大规模数据库的挖掘开始受到越来越多的关注。数据挖掘的对象也变得更加复杂,挖掘对象的数据维度越来越高。与超高数据维度相比,可用于分析的样本数量太少,导致产生高维小样本数据。高维小样本数据会给挖掘过程带来严重的维度灾难。通过特征选择,可以有效消除高维小样本数据中的冗余和噪声特征,避免维度灾难,提高挖掘算法的实际效率。但是,现有的特征选择方法强调特征选择结果的分类或聚类性能,而忽略了特征选择结果的稳定性,这将导致特征选择结果不稳定,难以获得真实可理解的特征。本文基于传统的特征选择方法,提出了一种集成特征选择方法,即随机位森林递归聚类消除(RBF-RCE)特征选择方法,结合多组基本分类器进行并行学习,筛选出最佳特征分类结果,优化了传统特征选择方法的分类性能,同时也提高了特征选择的稳定性。然后,本文分析了特征选择不稳定的原因,并引入了特征选择稳定性度量方法——交集度量(IM),用于评估特征选择过程是否稳定。通过对几组高维小样本数据集的实验验证了所提出方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2404/8486514/809060ead5e8/CIN2021-3597051.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验