IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2940-2951. doi: 10.1109/TNNLS.2020.3047335. Epub 2022 Jul 6.
Imbalanced class distribution is an inherent problem in many real-world classification tasks where the minority class is the class of interest. Many conventional statistical and machine learning classification algorithms are subject to frequency bias, and learning discriminating boundaries between the minority and majority classes could be challenging. To address the class distribution imbalance in deep learning, we propose a class rebalancing strategy based on a class-balanced dynamically weighted loss function where weights are assigned based on the class frequency and predicted probability of ground-truth class. The ability of dynamic weighting scheme to self-adapt its weights depending on the prediction scores allows the model to adjust for instances with varying levels of difficulty resulting in gradient updates driven by hard minority class samples. We further show that the proposed loss function is classification calibrated. Experiments conducted on highly imbalanced data across different applications of cyber intrusion detection (CICIDS2017 data set) and medical imaging (ISIC2019 data set) show robust generalization. Theoretical results supported by superior empirical performance provide justification for the validity of the proposed dynamically weighted balanced (DWB) loss function.
不平衡的类分布是许多现实世界分类任务中的一个固有问题,其中少数类是感兴趣的类。许多传统的统计和机器学习分类算法都受到频率偏差的影响,因此学习区分少数类和多数类之间的判别边界可能具有挑战性。为了解决深度学习中的类分布不平衡问题,我们提出了一种基于类平衡动态加权损失函数的类重平衡策略,其中权重是根据类频率和真实类预测概率分配的。动态加权方案根据预测分数自适应调整权重的能力允许模型针对不同难度级别进行调整,从而导致由困难的少数类样本驱动的梯度更新。我们进一步表明,所提出的损失函数是分类校准的。在网络入侵检测(CICIDS2017 数据集)和医学成像(ISIC2019 数据集)等不同应用中的高度不平衡数据上进行的实验表明,该模型具有强大的泛化能力。理论结果和优越的经验性能为所提出的动态加权平衡(DWB)损失函数的有效性提供了依据。