Suppr超能文献

基于径向基的多类不平衡数据分类过采样方法

Radial-Based Oversampling for Multiclass Imbalanced Data Classification.

作者信息

Krawczyk Bartosz, Koziarski Michal, Wozniak Michal

出版信息

IEEE Trans Neural Netw Learn Syst. 2020 Aug;31(8):2818-2831. doi: 10.1109/TNNLS.2019.2913673. Epub 2019 Jun 21.

Abstract

Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.

摘要

从不平衡数据中学习是当代机器学习中最热门的话题之一。然而,该领域绝大多数关注都集中在二分类问题上,而难度大得多的多分类问题则相对较少被探索。处理具有多个倾斜类别的数据集带来了各种挑战,需要更好地理解类之间的关系。在本文中,我们提出了多分类基于径向的过采样(MC-RBO),这是一种专门用于多分类问题的新型数据采样算法。我们方法的主要新颖之处在于使用势函数来生成人工实例。与现有的仅使用少数类特征的多分类过采样方法不同,我们考虑了来自所有类别的信息。人工实例生成过程通过探索互类分布值非常小的区域来引导。通过这种方式,我们确保了一个智能的过采样过程,该过程能够应对困难的数据分布并缓解现有方法的缺点。MC-RBO算法的有效性是基于广泛的实验研究进行评估的,并通过全面的统计分析加以支持。获得的结果表明,通过考虑来自所有类别的信息并进行智能过采样,我们可以显著改进从多分类不平衡数据中学习的过程。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验