Suppr超能文献

使用几何平均值对不平衡数据进行分类边界后提升。

Post-boosting of classification boundary for imbalanced data using geometric mean.

机构信息

Department of Computer and Information Science, University of Macau, Macau.

Department of Electromechanical Engineering, University of Macau, Macau.

出版信息

Neural Netw. 2017 Dec;96:101-114. doi: 10.1016/j.neunet.2017.09.004. Epub 2017 Sep 14.

Abstract

In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets.

摘要

本文提出了一种新的不平衡学习方法,用于二分类问题,名为基于分类边界后增强的不平衡数据学习(PBI),它可以显著提高任何训练过的神经网络(NN)分类边界的性能。PBI 的步骤非常简单,包括两步:首先应用(不平衡的)NN 学习方法生成分类边界,然后通过 PBI 在几何平均值(G-mean)下进行调整。对于数据不平衡,我们考虑少数类和多数类的准确性的几何平均值,这在统计学上比常用的准确率指标更合适。PBI 相对于传统的不平衡方法还有以下优势:(i)PBI 可以显著提高少数类的分类准确性,同时也能提高或保持多数类的准确性;(ii)PBI 适用于大数据量,即使不平衡比高达 0.001。为了评估(i),我们提出了一个新的度量标准,称为多数类损失/少数类优势比(MMR),用于评估多数类相对于少数类的损失比例。我们在不同大小、不同不平衡比和不同维度的基准数据集上对 PBI 和几种不平衡学习方法进行了实验。通过分析实验结果,PBI 在几乎所有数据集上都优于其他不平衡学习方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验