• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于类别不平衡学习的探索性欠采样

Exploratory undersampling for class-imbalance learning.

作者信息

Liu Xu-Ying, Wu Jianxin, Zhou Zhi-Hua

机构信息

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.

DOI:10.1109/TSMCB.2008.2007853
PMID:19095540
Abstract

Undersampling is a popular method in dealing with class-imbalance problems, which uses only a subset of the majority class and thus is very efficient. The main deficiency is that many majority class examples are ignored. We propose two algorithms to overcome this deficiency. EasyEnsemble samples several subsets from the majority class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade trains the learners sequentially, where in each step, the majority class examples that are correctly classified by the current trained learners are removed from further consideration. Experimental results show that both methods have higher Area Under the ROC Curve, F-measure, and G-mean values than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of undersampling when the same number of weak classifiers is used, which is significantly faster than other methods.

摘要

欠采样是处理类别不平衡问题的一种常用方法,它只使用多数类的一个子集,因此效率很高。其主要缺点是许多多数类样本被忽略。我们提出了两种算法来克服这一缺点。EasyEnsemble从多数类中采样几个子集,使用每个子集训练一个学习器,并将这些学习器的输出进行组合。BalanceCascade按顺序训练学习器,在每一步中,将当前训练好的学习器正确分类的多数类样本从进一步考虑中移除。实验结果表明,这两种方法在ROC曲线下面积、F值和G均值方面都比许多现有的类别不平衡学习方法更高。此外,当使用相同数量的弱分类器时,它们的训练时间与欠采样大致相同,这比其他方法要快得多。

相似文献

1
Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样
IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.
2
Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.用于不平衡数据集分类的进化欠采样:提议与分类法
Evol Comput. 2009 Fall;17(3):275-306. doi: 10.1162/evco.2009.17.3.275.
3
A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification.一种用于不平衡分类的噪声滤波欠采样方案。
IEEE Trans Cybern. 2017 Dec;47(12):4263-4274. doi: 10.1109/TCYB.2016.2606104. Epub 2016 Oct 12.
4
Developing new fitness functions in genetic programming for classification with unbalanced data.在遗传编程中开发用于不平衡数据分类的新适应度函数。
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):406-21. doi: 10.1109/TSMCB.2011.2167144. Epub 2011 Sep 26.
5
Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems.基于多样化敏感性的不平衡分类问题欠采样方法。
IEEE Trans Cybern. 2015 Nov;45(11):2402-12. doi: 10.1109/TCYB.2014.2372060. Epub 2014 Dec 2.
6
Indexes for three-class classification performance assessment--an empirical comparison.用于三类分类性能评估的指标——实证比较
IEEE Trans Inf Technol Biomed. 2009 May;13(3):300-12. doi: 10.1109/TITB.2008.2009440. Epub 2009 Jan 20.
7
Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems.基于哈希的欠采样集成方法在不平衡模式分类问题中的应用。
IEEE Trans Cybern. 2022 Feb;52(2):1269-1279. doi: 10.1109/TCYB.2020.3000754. Epub 2022 Feb 16.
8
Efficient sparse kernel feature extraction based on partial least squares.基于偏最小二乘法的高效稀疏核特征提取
IEEE Trans Pattern Anal Mach Intell. 2009 Aug;31(8):1347-61. doi: 10.1109/TPAMI.2008.171.
9
Statistical instance-based pruning in ensembles of independent classifiers.独立分类器集成中的基于统计实例的剪枝
IEEE Trans Pattern Anal Mach Intell. 2009 Feb;31(2):364-9. doi: 10.1109/TPAMI.2008.204.
10
An application of methods for the probabilistic three-class classification of pregnancies of unknown location.未知位置妊娠概率性三类分类方法的应用
Artif Intell Med. 2009 Jun;46(2):139-54. doi: 10.1016/j.artmed.2008.12.003. Epub 2009 Jan 20.

引用本文的文献

1
StackNAFLD: An Accurate Stacking Ensemble Learning Targeting NAFLD Treatment.StackNAFLD:一种针对非酒精性脂肪性肝病治疗的精确堆叠集成学习方法。
ACS Omega. 2025 Aug 15;10(33):37096-37114. doi: 10.1021/acsomega.5c01473. eCollection 2025 Aug 26.
2
Supervised Machine Learning Algorithms for Fitness-Based Cardiometabolic Risk Classification in Adolescents.用于青少年基于健康状况的心血管代谢风险分类的监督式机器学习算法
Sports (Basel). 2025 Aug 18;13(8):273. doi: 10.3390/sports13080273.
3
Advanced feature engineering in Acute:Chronic Workload Ratio (ACWR) calculation for injury forecasting in elite soccer.
精英足球运动中用于损伤预测的急性:慢性工作量比值(ACWR)计算中的高级特征工程
PLoS One. 2025 Jul 23;20(7):e0327960. doi: 10.1371/journal.pone.0327960. eCollection 2025.
4
Adaptive Sampling Framework for Imbalanced DDoS Traffic Classification.用于不均衡DDoS流量分类的自适应采样框架
Sensors (Basel). 2025 Jun 24;25(13):3932. doi: 10.3390/s25133932.
5
Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models.用于稀有事件数据和稀疏模型的尺度不变最优采样
Adv Neural Inf Process Syst. 2024;37:98384-98418.
6
Hybrid preprocessing and ensemble classification for enhanced detection of Parkinson's disease using multiple speech signal databases.使用多个语音信号数据库的混合预处理与集成分类用于增强帕金森病检测
Digit Health. 2025 Jun 26;11:20552076251352941. doi: 10.1177/20552076251352941. eCollection 2025 Jan-Dec.
7
Evaluating how different balancing data techniques impact on prediction of premature birth using machine learning models.评估不同的平衡数据技术如何使用机器学习模型对早产预测产生影响。
PLoS One. 2025 Apr 2;20(3):e0316574. doi: 10.1371/journal.pone.0316574. eCollection 2025.
8
Impact of imbalanced features on large datasets.不平衡特征对大型数据集的影响。
Front Big Data. 2025 Mar 13;8:1455442. doi: 10.3389/fdata.2025.1455442. eCollection 2025.
9
Easy ensemble classifier-group and intersectional fairness and threshold (EEC-GIFT): a fairness-aware machine learning framework for lung cancer screening eligibility using real-world data.简易集成分类器 - 组与交叉公平性及阈值(EEC - GIFT):一种使用真实世界数据进行肺癌筛查资格判定的公平感知机器学习框架。
JNCI Cancer Spectr. 2025 Mar 3;9(2). doi: 10.1093/jncics/pkaf030.
10
Machine Learning-Based Prediction of Early Complications Following Surgery for Intestinal Obstruction: Multicenter Retrospective Study.基于机器学习的肠梗阻手术后早期并发症预测:多中心回顾性研究
J Med Internet Res. 2025 Mar 3;27:e68354. doi: 10.2196/68354.