Department of Radiology, Duke University School of Medicine, Durham, NC, USA; School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29.
In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.
在这项研究中,我们系统地研究了类不平衡对卷积神经网络(CNN)分类性能的影响,并比较了常用于解决该问题的方法。类不平衡是经典机器学习中广泛研究的一个问题,但在深度学习背景下,系统研究非常有限。在我们的研究中,我们使用三个基准数据集(MNIST、CIFAR-10 和 ImageNet)来研究不平衡对分类的影响,并对几种解决该问题的方法进行了广泛比较:过采样、欠采样、两阶段训练和补偿先验类概率的阈值处理。我们的主要评估指标是调整后的多类任务接收器工作特征曲线下面积(ROC AUC),因为在不平衡数据的背景下,整体准确性指标与显著困难相关。基于实验结果,我们得出结论:(i)类不平衡对分类性能的影响是有害的;(ii)在几乎所有分析场景中占主导地位的解决类不平衡的方法是过采样;(iii)应该进行过采样,以完全消除不平衡,而最佳欠采样比取决于不平衡的程度;(iv)与一些经典机器学习模型不同,过采样不会导致 CNN 过拟合;(v)当关注正确分类案例的总数时,应该应用阈值处理来补偿先验类概率。