Wang Zhe, Cao Chenjie, Zhu Yujin
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.
In this article, we propose a novel entropy and confidence-based undersampling boosting (ECUBoost) framework to solve imbalanced problems. The boosting-based ensemble is combined with a new undersampling method to improve the generalization performance. To avoid losing informative samples during the data preprocessing of the boosting-based ensemble, both confidence and entropy are used in ECUBoost as benchmarks to ensure the validity and structural distribution of the majority samples during the undersampling. Furthermore, different from other iterative dynamic resampling methods, ECUBoost based on confidence can be applied to algorithms without iterations such as decision trees. Meanwhile, random forests are used as base classifiers in ECUBoost. Furthermore, experimental results on both artificial data sets and KEEL data sets prove the effectiveness of the proposed method.
在本文中,我们提出了一种新颖的基于熵和置信度的欠采样增强(ECUBoost)框架来解决不平衡问题。基于增强的集成方法与一种新的欠采样方法相结合,以提高泛化性能。为了避免在基于增强的集成方法的数据预处理过程中丢失信息性样本,ECUBoost中同时使用置信度和熵作为基准,以确保欠采样过程中多数样本的有效性和结构分布。此外,与其他迭代动态重采样方法不同,基于置信度的ECUBoost可以应用于诸如决策树等无迭代的算法。同时,随机森林被用作ECUBoost中的基分类器。此外,在人工数据集和KEEL数据集上的实验结果证明了所提方法的有效性。