• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多类分类中用于有效过采样和欠采样的混合聚类策略

Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

作者信息

Salehi Amirreza, Khedmati Majid

机构信息

Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran.

Department of Industrial Engineering, Sharif University of Technology, Azadi Ave., Tehran, 1458889694, Iran.

出版信息

Sci Rep. 2025 Jan 27;15(1):3460. doi: 10.1038/s41598-024-84786-2.

DOI:10.1038/s41598-024-84786-2
PMID:39870706
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11772689/
Abstract

Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.

摘要

多类不平衡是现实世界数据集中一个具有挑战性的问题,在这些数据集中,某些类别的样本数量可能较少,因为它们对应于罕见事件。为了应对多类不平衡的挑战,本文介绍了一种新颖的基于混合聚类的过采样和欠采样(HCBOU)技术。通过将类聚类并分为多数类和少数类,该算法在欠采样过程中保留了最多的信息,同时在少数类中生成了有效的数据。分类使用一对一和一对多分解方案进行。在30个数据集上进行了广泛的实验,以评估所提出算法的性能。随后将结果与几种最新算法的结果进行了比较。基于这些结果,所提出的算法在不同场景下优于竞争算法。最后,HCBOU算法在不同的类不平衡水平上都表现出强大的性能,突出了其在处理不平衡数据集方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/1e191f1f4480/41598_2024_84786_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/f43afc14ab9a/41598_2024_84786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/2e0eebe30f54/41598_2024_84786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/b2e5e3c4c3c3/41598_2024_84786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/fe9f6244df91/41598_2024_84786_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/1e191f1f4480/41598_2024_84786_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/f43afc14ab9a/41598_2024_84786_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/2e0eebe30f54/41598_2024_84786_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/b2e5e3c4c3c3/41598_2024_84786_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/fe9f6244df91/41598_2024_84786_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83f/11772689/1e191f1f4480/41598_2024_84786_Fig4_HTML.jpg

相似文献

1
Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.多类分类中用于有效过采样和欠采样的混合聚类策略
Sci Rep. 2025 Jan 27;15(1):3460. doi: 10.1038/s41598-024-84786-2.
2
Interaction effect between data discretization and data resampling for class-imbalanced medical datasets.类别不均衡医学数据集的数据离散化与数据重采样之间的交互作用。
Technol Health Care. 2025 Mar;33(2):1000-1013. doi: 10.1177/09287329241295874. Epub 2024 Nov 25.
3
Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy.用于不平衡数据集分类的进化欠采样:提议与分类法
Evol Comput. 2009 Fall;17(3):275-306. doi: 10.1162/evco.2009.17.3.275.
4
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
5
Hybrid Classifier Ensemble for Imbalanced Data.混合分类器集成用于不平衡数据。
IEEE Trans Neural Netw Learn Syst. 2020 Apr;31(4):1387-1400. doi: 10.1109/TNNLS.2019.2920246. Epub 2019 Jun 28.
6
Radial-Based Oversampling for Multiclass Imbalanced Data Classification.基于径向基的多类不平衡数据分类过采样方法
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.
7
Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease.用于不平衡数据集分类的改进的基于重叠的欠采样及其在癫痫和帕金森病中的应用
Int J Neural Syst. 2020 Aug;30(8):2050043. doi: 10.1142/S0129065720500434. Epub 2020 Jul 17.
8
BlindSMOTE: Synthetic minority oversampling based only on evolutionary computation.盲SMOTE:仅基于进化计算的合成少数类过采样技术
Evol Comput. 2025 Apr 16:1-35. doi: 10.1162/evco_a_00374.
9
An oversampling-undersampling strategy for large-scale data linkage.一种用于大规模数据链接的过采样-欠采样策略。
Front Big Data. 2025 Apr 23;8:1542483. doi: 10.3389/fdata.2025.1542483. eCollection 2025.
10
Hybrid Class Balancing Approach for Chemical Compound Toxicity Prediction.用于化合物毒性预测的混合类平衡方法
Curr Comput Aided Drug Des. 2024 Sep 24. doi: 10.2174/0115734099315538240909101737.

本文引用的文献

1
A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data.一种基于数据中心的机器学习方法,用于利用低失衡 TCGA 数据改善脑胶质瘤分级预测。
Sci Rep. 2024 Jul 26;14(1):17195. doi: 10.1038/s41598-024-68291-0.
2
Data imbalance in cardiac health diagnostics using CECG-GAN.CECG-GAN 在心脏健康诊断中的数据不平衡问题。
Sci Rep. 2024 Jun 26;14(1):14767. doi: 10.1038/s41598-024-65619-8.
3
Deep reinforcement learning for multi-class imbalanced training: applications in healthcare.用于多类不平衡训练的深度强化学习:在医疗保健中的应用
Mach Learn. 2024;113(5):2655-2674. doi: 10.1007/s10994-023-06481-z. Epub 2023 Nov 28.
4
High-precision multiclass classification of lung disease through customized MobileNetV2 from chest X-ray images.通过定制的MobileNetV2从胸部X光图像实现肺部疾病的高精度多类别分类。
Comput Biol Med. 2023 Mar;155:106646. doi: 10.1016/j.compbiomed.2023.106646. Epub 2023 Feb 10.
5
Multiclass diagnosis of stages of Alzheimer's disease using linear discriminant analysis scoring for multimodal data.多模态数据的线性判别分析评分用于阿尔茨海默病各期的多类诊断。
Comput Biol Med. 2021 Jul;134:104478. doi: 10.1016/j.compbiomed.2021.104478. Epub 2021 May 9.
6
Radial-Based Oversampling for Multiclass Imbalanced Data Classification.基于径向基的多类不平衡数据分类过采样方法
IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.
7
Exploratory undersampling for class-imbalance learning.用于类别不平衡学习的探索性欠采样
IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):539-50. doi: 10.1109/TSMCB.2008.2007853. Epub 2008 Dec 16.