基于组合权重的多类不平衡数据过采样方法。

An oversampling method for multi-class imbalanced data based on composite weights.

机构信息

School of Automobile, Chang'an University, Xi'an, China.

College of Automobile Engineering, College of Humanities and Information Changchun University of Technology, Changchun, China.

出版信息

PLoS One. 2021 Nov 12;16(11):e0259227. doi: 10.1371/journal.pone.0259227. eCollection 2021.

DOI:10.1371/journal.pone.0259227

PMID:34767567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8589211/

Abstract

To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.

摘要

为了解决多类小样本的过采样问题，提高其分类精度，我们开发了一种基于分类排序和权重设置的过采样方法。所设计的过采样算法根据数据点到超平面的距离对数据集的每个类内的数据进行排序。此外，根据由数据密度和数据排序组成的采样权重，在类内进行迭代采样，并在相邻类的边界处进行类间采样。最后，对所有新生成的采样数据进行信息赋值。通过 UCI 不平衡数据集对算法进行训练和测试实验，并使用建立的综合指标对算法和其他算法在综合评价方法中的性能进行评估。结果表明，该算法在数量上使多类不平衡数据达到平衡，新生成的数据保持了原始样本的分布特征和信息特征。此外，与 SMOTE 和 SVMOM 等其他算法相比，该算法的分类精度约达到 90%，这表明该算法对于不平衡的多类样本具有较高的实用性和通用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于组合权重的多类不平衡数据过采样方法。

An oversampling method for multi-class imbalanced data based on composite weights.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于组合权重的多类不平衡数据过采样方法。

An oversampling method for multi-class imbalanced data based on composite weights.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献