IEEE Trans Neural Netw Learn Syst. 2020 Apr;31(4):1387-1400. doi: 10.1109/TNNLS.2019.2920246. Epub 2019 Jun 28.
The class imbalance problem has become a leading challenge. Although conventional imbalance learning methods are proposed to tackle this problem, they have some limitations: 1) undersampling methods suffer from losing important information and 2) cost-sensitive methods are sensitive to outliers and noise. To address these issues, we propose a hybrid optimal ensemble classifier framework that combines density-based undersampling and cost-effective methods through exploring state-of-the-art solutions using multi-objective optimization algorithm. Specifically, we first develop a density-based undersampling method to select informative samples from the original training data with probability-based data transformation, which enables to obtain multiple subsets following a balanced distribution across classes. Second, we exploit the cost-sensitive classification method to address the incompleteness of information problem via modifying weights of misclassified minority samples rather than the majority ones. Finally, we introduce a multi-objective optimization procedure and utilize connections between samples to self-modify the classification result using an ensemble classifier framework. Extensive comparative experiments conducted on real-world data sets demonstrate that our method outperforms the majority of imbalance and ensemble classification approaches.
类不平衡问题已成为主要挑战。尽管已经提出了传统的不平衡学习方法来解决这个问题,但它们存在一些局限性:1)欠采样方法会丢失重要信息,2)代价敏感方法对离群值和噪声敏感。为了解决这些问题,我们提出了一种混合最优集成分类器框架,该框架通过利用多目标优化算法探索最先进的解决方案,结合基于密度的欠采样和经济有效的方法。具体来说,我们首先开发了一种基于密度的欠采样方法,通过基于概率的数据转换从原始训练数据中选择有信息的样本,这使得能够在各个类之间获得具有平衡分布的多个子集。其次,我们利用代价敏感分类方法通过修改少数错分类样本的权重而不是多数样本的权重来解决信息不完整的问题。最后,我们引入了一种多目标优化过程,并利用样本之间的连接,使用集成分类器框架来自我修改分类结果。在真实数据集上进行的广泛比较实验表明,我们的方法优于大多数不平衡和集成分类方法。