基于 Hellinger 距离的高维类不平衡数据稳定稀疏特征选择。

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

机构信息

School of Science, Kunming University of Science and Technology, Kunming, 650500, People's Republic of China.

School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK.

出版信息

BMC Bioinformatics. 2020 Mar 23;21(1):121. doi: 10.1186/s12859-020-3411-3.

DOI:10.1186/s12859-020-3411-3

PMID:32293252

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7092448/

Abstract

BACKGROUND

Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to reducing model complexity and discovering key biomarkers, feature selection is also an effective method of combating overlapping which may arise in such data and become a crucial aspect for determining classification performance. However, ordinary feature selection techniques for classification can not be simply used for addressing class-imbalanced data without any adjustment. Thus, more efficient feature selection technique must be developed for complicated class-imbalanced data, especially in the context of high-dimensionality.

RESULTS

We proposed an algorithm called sssHD to achieve stable sparse feature selection applied it to complicated class-imbalanced data. sssHD is based on the Hellinger distance (HD) coupled with sparse regularization techniques. We stated that Hellinger distance is not only class-insensitive but also translation-invariant. Simulation result indicates that HD-based selection algorithm is effective in recognizing key features and control false discoveries for class-imbalance learning. Five gene expression datasets are also employed to test the performance of the sssHD algorithm, and a comparison with several existing selection procedures is performed. The result shows that sssHD is highly competitive in terms of five assessment metrics. In addition, sssHD presents limited differences between performing and not performing re-balance preprocessing.

CONCLUSIONS

sssHD is a practical feature selection method for high-dimensional class-imbalanced data, which is simple and can be an alternative for performing feature selection in class-imbalanced data. sssHD can be easily extended by connecting it with different re-balance preprocessing, different sparse regularization structures as well as different classifiers. As such, the algorithm is extremely general and has a wide range of applicability.

摘要

背景

由于许多科学领域的高维类别不平衡数据的大量增长，特征选择在类别不平衡学习中越来越受到关注。除了降低模型的复杂性和发现关键生物标志物之外，特征选择也是一种有效的方法，可以克服此类数据中可能出现的重叠问题，并且成为确定分类性能的关键方面。但是，普通的分类特征选择技术不能在不进行任何调整的情况下简单地用于处理类别不平衡数据。因此，必须针对复杂的类别不平衡数据，特别是在高维情况下，开发更有效的特征选择技术。

结果

我们提出了一种名为 sssHD 的算法，用于实现稳定稀疏的特征选择，并将其应用于复杂的类别不平衡数据。sssHD 基于 Hellinger 距离（HD）和稀疏正则化技术。我们指出，Hellinger 距离不仅对类别不敏感，而且还具有平移不变性。模拟结果表明，基于 HD 的选择算法在识别关键特征和控制类别不平衡学习中的错误发现方面非常有效。我们还使用五个基因表达数据集来测试 sssHD 算法的性能，并与几种现有的选择过程进行了比较。结果表明，sssHD 在五个评估指标方面具有很强的竞争力。此外，sssHD 在执行和不执行重新平衡预处理之间的差异有限。

结论

sssHD 是一种适用于高维类别不平衡数据的实用特征选择方法，它简单易用，可以作为类别不平衡数据中执行特征选择的替代方法。sssHD 可以通过连接不同的重新平衡预处理、不同的稀疏正则化结构以及不同的分类器来轻松扩展。因此，该算法非常通用，具有广泛的适用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于 Hellinger 距离的高维类不平衡数据稳定稀疏特征选择。

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

基于 Hellinger 距离的高维类不平衡数据稳定稀疏特征选择。

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献