IEEE Trans Cybern. 2022 Oct;52(10):10444-10457. doi: 10.1109/TCYB.2021.3070005. Epub 2022 Sep 19.
This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called "random space division sampling" (RSDS). It can extract the boundary points as the sampled result by efficiently distinguishing the label noise points, inner points, and boundary points. This makes it the first general sampling method for classification that not only can reduce the data size but also enhance the classification accuracy of a classifier, especially in the label-noisy classification. The "general" means that it is not restricted to any specific classifiers or datasets (regardless of whether a dataset is linear or not). Furthermore, the RSDS can online accelerate most classifiers because of its lower time complexity than most classifiers. Moreover, the RSDS can be used as an undersampling method for imbalanced classification. The experimental results on benchmark datasets demonstrate its effectiveness and efficiency. The code of the RSDS and comparison algorithms is available at: https://github.com/syxiaa/RSDS.
本文提出了一种简单的采样方法,通过引入随机空间划分的思想,称为“随机空间划分采样”(RSDS),可以有效地区分标签噪声点、内点和边界点,从而提取边界点作为采样结果。这使得它成为第一个通用的分类采样方法,不仅可以减小数据规模,而且可以提高分类器的分类精度,特别是在标签噪声分类中。“通用”是指它不受任何特定分类器或数据集的限制(无论数据集是否线性)。此外,由于 RSDS 的时间复杂度低于大多数分类器,因此它可以在线加速大多数分类器。此外,RSDS 可以用作不平衡分类的欠采样方法。在基准数据集上的实验结果证明了其有效性和效率。RSDS 和比较算法的代码可在:https://github.com/syxiaa/RSDS 获得。