Department of Computer and Information Science, University of Macau, Macau.
Department of Electromechanical Engineering, University of Macau, Macau.
Neural Netw. 2017 Dec;96:101-114. doi: 10.1016/j.neunet.2017.09.004. Epub 2017 Sep 14.
In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets.
本文提出了一种新的不平衡学习方法,用于二分类问题,名为基于分类边界后增强的不平衡数据学习(PBI),它可以显著提高任何训练过的神经网络(NN)分类边界的性能。PBI 的步骤非常简单,包括两步:首先应用(不平衡的)NN 学习方法生成分类边界,然后通过 PBI 在几何平均值(G-mean)下进行调整。对于数据不平衡,我们考虑少数类和多数类的准确性的几何平均值,这在统计学上比常用的准确率指标更合适。PBI 相对于传统的不平衡方法还有以下优势:(i)PBI 可以显著提高少数类的分类准确性,同时也能提高或保持多数类的准确性;(ii)PBI 适用于大数据量,即使不平衡比高达 0.001。为了评估(i),我们提出了一个新的度量标准,称为多数类损失/少数类优势比(MMR),用于评估多数类相对于少数类的损失比例。我们在不同大小、不同不平衡比和不同维度的基准数据集上对 PBI 和几种不平衡学习方法进行了实验。通过分析实验结果,PBI 在几乎所有数据集上都优于其他不平衡学习方法。