Suppr超能文献

一种用于不平衡分类的噪声滤波欠采样方案。

A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification.

出版信息

IEEE Trans Cybern. 2017 Dec;47(12):4263-4274. doi: 10.1109/TCYB.2016.2606104. Epub 2016 Oct 12.

Abstract

Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, -measure, and -mean.

摘要

欠采样是处理类别不平衡问题的一种常用数据预处理方法,其目的是平衡数据集以实现高分类率,并避免偏向多数类示例的偏差。它总是在训练数据集中使用完整的少数数据。然而,一些有噪声的少数示例可能会降低分类器的性能。在本文中,提出了一种新的欠采样方案,即在执行重采样之前结合噪声滤波器。为了验证效率,通过基准测试和显著性分析,该方案基于四种流行的欠采样方法,即 Undersampling + Adaboost、RUSBoost、UnderBagging 和 EasyEnsemble 来实现。此外,本文还总结了算法性能与不平衡比之间的关系。实验结果表明,所提出的方案可以提高原始基于欠采样的方法,在三个流行的不平衡分类度量方面具有显著意义,即曲线下面积、-measure 和 -mean。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验