• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于径向基的多类不平衡数据分类过采样方法

Radial-Based Oversampling for Multiclass Imbalanced Data Classification.

作者信息

Krawczyk Bartosz, Koziarski Michal, Wozniak Michal

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.

DOI:10.1109/TNNLS.2019.2913673
PMID:31247563
Abstract

Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.

摘要

从不平衡数据中学习是当代机器学习中最热门的话题之一。然而,该领域绝大多数关注都集中在二分类问题上,而难度大得多的多分类问题则相对较少被探索。处理具有多个倾斜类别的数据集带来了各种挑战,需要更好地理解类之间的关系。在本文中,我们提出了多分类基于径向的过采样(MC-RBO),这是一种专门用于多分类问题的新型数据采样算法。我们方法的主要新颖之处在于使用势函数来生成人工实例。与现有的仅使用少数类特征的多分类过采样方法不同,我们考虑了来自所有类别的信息。人工实例生成过程通过探索互类分布值非常小的区域来引导。通过这种方式,我们确保了一个智能的过采样过程,该过程能够应对困难的数据分布并缓解现有方法的缺点。MC-RBO算法的有效性是基于广泛的实验研究进行评估的,并通过全面的统计分析加以支持。获得的结果表明,通过考虑来自所有类别的信息并进行智能过采样,我们可以显著改进从多分类不平衡数据中学习的过程。

相似文献

1
Radial-Based Oversampling for Multiclass Imbalanced Data Classification.基于径向基的多类不平衡数据分类过采样方法
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.
2
Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines.支持向量机核空间中基于过采样的不平衡数据分类
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4065-4076. doi: 10.1109/TNNLS.2017.2751612. Epub 2017 Oct 10.
3
Binarization With Boosting and Oversampling for Multiclass Classification.基于提升和过采样的多类分类二值化方法。
IEEE Trans Cybern. 2016 May;46(5):1078-91. doi: 10.1109/TCYB.2015.2423295. Epub 2015 Apr 30.
4
A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.一种基于阈值移动的简单插件式装袋集成方法,用于对二分类和多分类不平衡数据进行分类。
Neurocomputing (Amst). 2018 Jan 31;275:330-340. doi: 10.1016/j.neucom.2017.08.035.
5
To Combat Multiclass Imbalanced Problems by Aggregating Evolutionary Hierarchical Classifiers.通过聚合进化层次分类器来对抗多类不平衡问题。
IEEE Trans Neural Netw Learn Syst. 2024 Apr 8;PP. doi: 10.1109/TNNLS.2024.3383672.
6
Deep Learning-Based Imbalanced Classification With Fuzzy Support Vector Machine.基于深度学习和模糊支持向量机的不平衡分类
Front Bioeng Biotechnol. 2022 Jan 21;9:802712. doi: 10.3389/fbioe.2021.802712. eCollection 2021.
7
Iterative ensemble feature selection for multiclass classification of imbalanced microarray data.用于不平衡微阵列数据多类分类的迭代集成特征选择
J Biol Res (Thessalon). 2016 Jul 4;23(Suppl 1):13. doi: 10.1186/s40709-016-0045-8. eCollection 2016 May.
8
RACOG and wRACOG: Two Probabilistic Oversampling Techniques.RACOG和wRACOG:两种概率性过采样技术。
IEEE Trans Knowl Data Eng. 2015 Jan 1;27(1):222-234. doi: 10.1109/TKDE.2014.2324567. Epub 2014 May 16.
9
Multiclass Imbalance Problems: Analysis and Potential Solutions.多类不平衡问题:分析与潜在解决方案
IEEE Trans Syst Man Cybern B Cybern. 2012 Aug;42(4):1119-30. doi: 10.1109/TSMCB.2012.2187280. Epub 2012 Mar 16.
10
Immune centroids oversampling method for binary classification.用于二分类的免疫质心过采样方法。
Comput Intell Neurosci. 2015;2015:109806. doi: 10.1155/2015/109806. Epub 2015 Mar 5.

引用本文的文献

1
Intracranial hemorrhage segmentation and classification framework in computer tomography images using deep learning techniques.利用深度学习技术的计算机断层扫描图像中的颅内出血分割与分类框架
Sci Rep. 2025 May 17;15(1):17151. doi: 10.1038/s41598-025-01317-3.
2
Artificial intelligence in traditional Chinese medicine: advances in multi-metabolite multi-target interaction modeling.人工智能在中医领域的应用:多代谢物多靶点相互作用建模的进展
Front Pharmacol. 2025 Apr 15;16:1541509. doi: 10.3389/fphar.2025.1541509. eCollection 2025.
3
Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.
多类分类中用于有效过采样和欠采样的混合聚类策略
Sci Rep. 2025 Jan 27;15(1):3460. doi: 10.1038/s41598-024-84786-2.
4
Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions.提高乌干达口蹄疫疫情的随机森林预测性能:一种针对不同分布的校准不确定性预测方法。
Front Artif Intell. 2024 Nov 1;7:1455331. doi: 10.3389/frai.2024.1455331. eCollection 2024.
5
A review on over-sampling techniques in classification of multi-class imbalanced datasets: insights for medical problems.多类不平衡数据集分类中的过采样技术综述:对医学问题的见解
Front Digit Health. 2024 Jul 26;6:1430245. doi: 10.3389/fdgth.2024.1430245. eCollection 2024.
6
A biological age model based on physical examination data to predict mortality in a Chinese population.一种基于体格检查数据的生物学年龄模型,用于预测中国人群的死亡率。
iScience. 2024 Feb 3;27(3):108891. doi: 10.1016/j.isci.2024.108891. eCollection 2024 Mar 15.
7
Towards an Optimal KELM Using the PSO-BOA Optimization Strategy with Applications in Data Classification.基于粒子群优化-花粉传播算法优化策略的最优极限学习机在数据分类中的应用
Biomimetics (Basel). 2023 Jul 12;8(3):306. doi: 10.3390/biomimetics8030306.
8
An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream.基于数据流中簇失真最小化的稀有类合成过采样的弹性自调整技术。
Sensors (Basel). 2023 Feb 11;23(4):2061. doi: 10.3390/s23042061.