Suppr超能文献

基于熵和置信度的欠采样提升随机森林解决不平衡问题

Entropy and Confidence-Based Undersampling Boosting Random Forests for Imbalanced Problems.

作者信息

Wang Zhe, Cao Chenjie, Zhu Yujin

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5178-5191. doi: 10.1109/TNNLS.2020.2964585. Epub 2020 Nov 30.

Abstract

In this article, we propose a novel entropy and confidence-based undersampling boosting (ECUBoost) framework to solve imbalanced problems. The boosting-based ensemble is combined with a new undersampling method to improve the generalization performance. To avoid losing informative samples during the data preprocessing of the boosting-based ensemble, both confidence and entropy are used in ECUBoost as benchmarks to ensure the validity and structural distribution of the majority samples during the undersampling. Furthermore, different from other iterative dynamic resampling methods, ECUBoost based on confidence can be applied to algorithms without iterations such as decision trees. Meanwhile, random forests are used as base classifiers in ECUBoost. Furthermore, experimental results on both artificial data sets and KEEL data sets prove the effectiveness of the proposed method.

摘要

在本文中,我们提出了一种新颖的基于熵和置信度的欠采样增强(ECUBoost)框架来解决不平衡问题。基于增强的集成方法与一种新的欠采样方法相结合,以提高泛化性能。为了避免在基于增强的集成方法的数据预处理过程中丢失信息性样本,ECUBoost中同时使用置信度和熵作为基准,以确保欠采样过程中多数样本的有效性和结构分布。此外,与其他迭代动态重采样方法不同,基于置信度的ECUBoost可以应用于诸如决策树等无迭代的算法。同时,随机森林被用作ECUBoost中的基分类器。此外,在人工数据集和KEEL数据集上的实验结果证明了所提方法的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验