Suppr超能文献

一种用于不平衡数据分类的基于聚类的混合采样方法。

A cluster-based hybrid sampling approach for imbalanced data classification.

作者信息

Feng Shou, Zhao Chunhui, Fu Ping

机构信息

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China.

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China.

出版信息

Rev Sci Instrum. 2020 May 1;91(5):055101. doi: 10.1063/5.0008935.

Abstract

When processing instrumental data by using classification approaches, the imbalanced dataset problem is usually challenging. As the minority class instances could be overwhelmed by the majority class instances, training a typical classifier with such a dataset directly might get poor results in classifying the minority class. We propose a cluster-based hybrid sampling approach CUSS (Cluster-based Under-sampling and SMOTE) for imbalanced dataset classification, which belongs to the type of data-level methods and is different from previously proposed hybrid methods. A new cluster-based under-sampling method is designed for CUSS, and a new strategy to set the expected instance number according to data distribution in the original training dataset is also proposed in this paper. The proposed method is compared with five other popular resampling methods on 15 datasets with different instance numbers and different imbalance ratios. The experimental results show that the CUSS method has good performance and outperforms other state-of-the-art methods.

摘要

当使用分类方法处理仪器数据时,不平衡数据集问题通常具有挑战性。由于少数类实例可能会被多数类实例淹没,直接使用这样的数据集训练典型分类器在对少数类进行分类时可能会得到较差的结果。我们提出了一种用于不平衡数据集分类的基于聚类的混合采样方法CUSS(基于聚类的欠采样和SMOTE),它属于数据级方法类型,与先前提出的混合方法不同。为CUSS设计了一种新的基于聚类的欠采样方法,本文还提出了一种根据原始训练数据集中的数据分布设置期望实例数的新策略。在15个具有不同实例数和不同不平衡率的数据集上,将所提出的方法与其他五种流行的重采样方法进行了比较。实验结果表明,CUSS方法具有良好的性能,优于其他现有方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验