基于熵和置信度的欠采样提升随机森林解决不平衡问题

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

作者信息

Wang Zhe, Cao Chenjie, Zhu Yujin

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.

DOI:10.1109/TNNLS.2020.2964585

PMID:31995503

Abstract

In this article, we propose a novel entropy and confidence-based undersampling boosting (ECUBoost) framework to solve imbalanced problems. The boosting-based ensemble is combined with a new undersampling method to improve the generalization performance. To avoid losing informative samples during the data preprocessing of the boosting-based ensemble, both confidence and entropy are used in ECUBoost as benchmarks to ensure the validity and structural distribution of the majority samples during the undersampling. Furthermore, different from other iterative dynamic resampling methods, ECUBoost based on confidence can be applied to algorithms without iterations such as decision trees. Meanwhile, random forests are used as base classifiers in ECUBoost. Furthermore, experimental results on both artificial data sets and KEEL data sets prove the effectiveness of the proposed method.

摘要

在本文中，我们提出了一种新颖的基于熵和置信度的欠采样增强（ECUBoost）框架来解决不平衡问题。基于增强的集成方法与一种新的欠采样方法相结合，以提高泛化性能。为了避免在基于增强的集成方法的数据预处理过程中丢失信息性样本，ECUBoost中同时使用置信度和熵作为基准，以确保欠采样过程中多数样本的有效性和结构分布。此外，与其他迭代动态重采样方法不同，基于置信度的ECUBoost可以应用于诸如决策树等无迭代的算法。同时，随机森林被用作ECUBoost中的基分类器。此外，在人工数据集和KEEL数据集上的实验结果证明了所提方法的有效性。

相似文献

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.

Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems.

IEEE Trans Cybern. 2022 Feb;52(2):1269-1279. doi: 10.1109/TCYB.2020.3000754. Epub 2022 Feb 16.

Combining Resampling Strategies and Ensemble Machine Learning Methods to Enhance Prediction of Neonates with a Low Apgar Score After Induction of Labor in Northern Tanzania.

Risk Manag Healthc Policy. 2021 Sep 7;14:3711-3720. doi: 10.2147/RMHP.S331077. eCollection 2021.

Embedding Undersampling Rotation Forest for Imbalanced Problem.

Comput Intell Neurosci. 2018 Nov 1;2018:6798042. doi: 10.1155/2018/6798042. eCollection 2018.

Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems.

IEEE Trans Cybern. 2015 Nov;45(11):2402-12. doi: 10.1109/TCYB.2014.2372060. Epub 2014 Dec 2.

Cascade interpolation learning with double subspaces and confidence disturbance for imbalanced problems.

Neural Netw. 2019 Oct;118:17-31. doi: 10.1016/j.neunet.2019.06.003. Epub 2019 Jun 8.

A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification.

IEEE Trans Cybern. 2017 Dec;47(12):4263-4274. doi: 10.1109/TCYB.2016.2606104. Epub 2016 Oct 12.

A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset.

Healthcare (Basel). 2022 Jul 5;10(7):1255. doi: 10.3390/healthcare10071255.

Rotation of random forests for genomic and proteomic classification problems.

Adv Exp Med Biol. 2011;696:211-21. doi: 10.1007/978-1-4419-7046-6_21.

A preclustering-based ensemble learning technique for acute appendicitis diagnoses.

Artif Intell Med. 2013 Jun;58(2):115-24. doi: 10.1016/j.artmed.2013.03.007. Epub 2013 Apr 23.

引用本文的文献

FADEL: Ensemble Learning Enhanced by Feature Augmentation and Discretization.

Bioengineering (Basel). 2025 Jul 30;12(8):827. doi: 10.3390/bioengineering12080827.

A DDoS Detection Method Based on Feature Engineering and Machine Learning in Software-Defined Networks.

Sensors (Basel). 2023 Jul 5;23(13):6176. doi: 10.3390/s23136176.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于熵和置信度的欠采样提升随机森林解决不平衡问题

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

作者信息

Wang Zhe, Cao Chenjie, Zhu Yujin

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.

DOI:10.1109/TNNLS.2020.2964585

PMID:31995503

Abstract

摘要

基于熵和置信度的欠采样提升随机森林解决不平衡问题

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于熵和置信度的欠采样提升随机森林解决不平衡问题

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

作者信息

出版信息

相似文献

引用本文的文献