Salehi Amirreza, Khedmati Majid
Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran.
Department of Industrial Engineering, Sharif University of Technology, Azadi Ave., Tehran, 1458889694, Iran.
Sci Rep. 2025 Jan 27;15(1):3460. doi: 10.1038/s41598-024-84786-2.
Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.
多类不平衡是现实世界数据集中一个具有挑战性的问题,在这些数据集中,某些类别的样本数量可能较少,因为它们对应于罕见事件。为了应对多类不平衡的挑战,本文介绍了一种新颖的基于混合聚类的过采样和欠采样(HCBOU)技术。通过将类聚类并分为多数类和少数类,该算法在欠采样过程中保留了最多的信息,同时在少数类中生成了有效的数据。分类使用一对一和一对多分解方案进行。在30个数据集上进行了广泛的实验,以评估所提出算法的性能。随后将结果与几种最新算法的结果进行了比较。基于这些结果,所提出的算法在不同场景下优于竞争算法。最后,HCBOU算法在不同的类不平衡水平上都表现出强大的性能,突出了其在处理不平衡数据集方面的有效性。