使用几何平均值对不平衡数据进行分类边界后提升。

Post-boosting of classification boundary for imbalanced data using geometric mean.

机构信息

Department of Computer and Information Science, University of Macau, Macau.

Department of Electromechanical Engineering, University of Macau, Macau.

出版信息

Neural Netw. 2017 Dec;96:101-114. doi: 10.1016/j.neunet.2017.09.004. Epub 2017 Sep 14.

DOI:10.1016/j.neunet.2017.09.004

PMID:28987974

Abstract

In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets.

摘要

本文提出了一种新的不平衡学习方法，用于二分类问题，名为基于分类边界后增强的不平衡数据学习（PBI），它可以显著提高任何训练过的神经网络（NN）分类边界的性能。PBI 的步骤非常简单，包括两步：首先应用（不平衡的）NN 学习方法生成分类边界，然后通过 PBI 在几何平均值（G-mean）下进行调整。对于数据不平衡，我们考虑少数类和多数类的准确性的几何平均值，这在统计学上比常用的准确率指标更合适。PBI 相对于传统的不平衡方法还有以下优势：（i）PBI 可以显著提高少数类的分类准确性，同时也能提高或保持多数类的准确性；（ii）PBI 适用于大数据量，即使不平衡比高达 0.001。为了评估（i），我们提出了一个新的度量标准，称为多数类损失/少数类优势比（MMR），用于评估多数类相对于少数类的损失比例。我们在不同大小、不同不平衡比和不同维度的基准数据集上对 PBI 和几种不平衡学习方法进行了实验。通过分析实验结果，PBI 在几乎所有数据集上都优于其他不平衡学习方法。

相似文献

Post-boosting of classification boundary for imbalanced data using geometric mean.使用几何平均值对不平衡数据进行分类边界后提升。

Neural Netw. 2017 Dec;96:101-114. doi: 10.1016/j.neunet.2017.09.004. Epub 2017 Sep 14.

Online sequential class-specific extreme learning machine for binary imbalanced learning.在线序贯类特定极端学习机用于二进制不平衡学习。

Neural Netw. 2019 Nov;119:235-248. doi: 10.1016/j.neunet.2019.08.018. Epub 2019 Aug 23.

Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN.基于自适应多层 ELM 与动态 GAN 结合的生物医学数据不平衡分类。

Biomed Eng Online. 2018 Dec 4;17(1):181. doi: 10.1186/s12938-018-0604-3.

A systematic study of the class imbalance problem in convolutional neural networks.卷积神经网络中类不平衡问题的系统研究。

Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29.

Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification.用于不平衡和概念漂移数据分类的元认知在线序列极限学习机

Neural Netw. 2016 Aug;80:79-94. doi: 10.1016/j.neunet.2016.04.008. Epub 2016 Apr 28.

Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning.调整后的几何均值：一种用于不平衡生物信息学数据集学习的新型性能度量。

J Bioinform Comput Biol. 2012 Aug;10(4):1250003. doi: 10.1142/S0219720012500035. Epub 2012 Jun 22.

Class-specific extreme learning machine for handling binary class imbalance problem.用于处理二分类不平衡问题的特定类别的极限学习机。

Neural Netw. 2018 Sep;105:206-217. doi: 10.1016/j.neunet.2018.05.011. Epub 2018 May 22.

Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data.不要被类别不平衡问题困扰：选择合适的分类器和性能指标，对不平衡数据进行脑解码。

Neuroimage. 2023 Aug 15;277:120253. doi: 10.1016/j.neuroimage.2023.120253. Epub 2023 Jun 28.

Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类

J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.

Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model.通过结合改进的大趋势扩散和装袋极限学习机模型的新型混合采样，改进不平衡医学数据集的支持向量机分类。

Math Biosci Eng. 2023 Sep 15;20(10):17672-17701. doi: 10.3934/mbe.2023786.

引用本文的文献

Application of the Unbalanced Ensemble Algorithm for Prognostic Prediction Outcomes of All-Cause Mortality in Coronary Heart Disease Patients Comorbid with Hypertension.不平衡集成算法在合并高血压的冠心病患者全因死亡率预后预测结果中的应用

Risk Manag Healthc Policy. 2024 Aug 6;17:1921-1936. doi: 10.2147/RMHP.S472398. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用几何平均值对不平衡数据进行分类边界后提升。

Post-boosting of classification boundary for imbalanced data using geometric mean.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献