Suppr超能文献

一种基于Copula熵的高效交互式高维遗传数据特征选择方法。

An efficient and interactive feature selection approach based on copula entropy for high-dimensional genetic data.

作者信息

Yan Xiaoran, Shang Shilong, Li Dongxi, Dang Yun

机构信息

College of Artificial Intelligence, Taiyuan University of Technology, Taiyuan, Shanxi, China.

College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi, China.

出版信息

Sci Rep. 2025 Aug 17;15(1):30100. doi: 10.1038/s41598-025-15068-8.

Abstract

Feature selection (FS) is especially important for high-dimensional data. In this paper, we propose an efficient and interactive feature selection approach based on copula entropy (CEFS+). The method combines feature-feature mutual information with feature-label mutual information and uses a maximum correlation minimum redundancy strategy for greedy selection. The approach uses copula entropy as a measure of feature relevance that captures the full-order interaction gain between features. Moreover, we prove the divisibility of multivariate mutual information, and derive a novel feature criterion, and propose a feature selection approach based on copula entropy called CEFS. Meanwhile, to overcome the instability of the CEFS method on some datasets, we propose the improved method CEFS+ which based on the rank technique. Finally, we evaluate the effectiveness of CEFS and CEFS+ using three classifiers on five datasets. In 10 out of 15 scenarios, our approach obtains the highest classification accuracy, which is much higher than the other six commonly used FS methods. In particular, our approach performs better on high-dimensional genetic datasets.

摘要

特征选择(FS)对于高维数据尤为重要。在本文中,我们提出了一种基于copula熵的高效交互式特征选择方法(CEFS+)。该方法将特征-特征互信息与特征-标签互信息相结合,并采用最大相关最小冗余策略进行贪心选择。该方法使用copula熵作为特征相关性的度量,以捕获特征之间的全阶交互增益。此外,我们证明了多元互信息的可分性,推导了一种新的特征准则,并提出了一种基于copula熵的特征选择方法CEFS。同时,为了克服CEFS方法在某些数据集上的不稳定性,我们提出了基于排序技术的改进方法CEFS+。最后,我们使用三个分类器在五个数据集上评估了CEFS和CEFS+的有效性。在15个场景中的10个场景中,我们的方法获得了最高的分类准确率,远高于其他六种常用的FS方法。特别是,我们的方法在高维遗传数据集上表现更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验