Ning Zhihan, Jiang Zhixing, Zhang David
IEEE Trans Neural Netw Learn Syst. 2024 Apr 8;PP. doi: 10.1109/TNNLS.2024.3383672.
Real-world datasets are often imbalanced, posing frequent challenges to canonical machine learning algorithms that assume a balanced class distribution. Moreover, the imbalance problem becomes more complicated when the dataset is multiclass. Although many approaches have been presented for imbalanced learning (IL), research on the multiclass imbalanced problem is relatively limited and deficient. To alleviate these issues, we propose a forest of evolutionary hierarchical classifiers (FEHC) method for multiclass IL (MCIL). FEHC can be seen as a classifier fusion framework with a forest structure, and it aggregates several evolutionary hierarchical multiclassifiers (EHMCs) to reduce generalization error. Specifically, a multichromosome genetic algorithm (MCGA) is designed to simultaneously select (sub)optimal features, classifiers, and hierarchical structures when generating these EHMCs. The MCGA adopts a dynamic weighting module to learn difficult classes and promote the diversity of FEHC. We also present the "stratified underbagging" (SUB) strategy to address class imbalance and the "soft tree traversal" (STT) strategy to make FEHC converge faster and better. We thoroughly evaluate the proposed algorithm using 14 multiclass imbalanced datasets with various properties. Compared with popular and state-of-the-art approaches, FEHC obtains better performance under different evaluation metrics. Codes have been made publicly available on GitHub.https://github.com/CUHKSZ-NING/FEHCClassifier.
现实世界的数据集往往是不平衡的,这给假设类分布平衡的传统机器学习算法带来了频繁的挑战。此外,当数据集是多类时,不平衡问题会变得更加复杂。尽管已经提出了许多用于不平衡学习(IL)的方法,但对多类不平衡问题的研究相对有限且不足。为了缓解这些问题,我们提出了一种用于多类不平衡学习(MCIL)的进化分层分类器森林(FEHC)方法。FEHC可以看作是一个具有森林结构的分类器融合框架,它聚合了多个进化分层多分类器(EHMC)以减少泛化误差。具体来说,设计了一种多染色体遗传算法(MCGA),在生成这些EHMC时同时选择(子)最优特征、分类器和层次结构。MCGA采用动态加权模块来学习困难类并促进FEHC的多样性。我们还提出了“分层欠采样”(SUB)策略来解决类不平衡问题,以及“软树遍历”(STT)策略以使FEHC更快更好地收敛。我们使用14个具有不同属性的多类不平衡数据集对提出的算法进行了全面评估。与流行的和最新的方法相比,FEHC在不同的评估指标下都获得了更好的性能。代码已在GitHub上公开提供。https://github.com/CUHKSZ-NING/FEHCClassifier。