IEEE Trans Cybern. 2018 Feb;48(2):689-702. doi: 10.1109/TCYB.2017.2651114. Epub 2017 Jan 19.
Semisupervised learning methods are often adopted to handle datasets with very small number of labeled samples. However, conventional semisupervised ensemble learning approaches have two limitations: 1) most of them cannot obtain satisfactory results on high dimensional datasets with limited labels and 2) they usually do not consider how to use an optimization process to enlarge the training set. In this paper, we propose the progressive semisupervised ensemble learning approach (PSEMISEL) to address the above limitations and handle datasets with very small number of labeled samples. When compared with traditional semisupervised ensemble learning approaches, PSEMISEL is characterized by two properties: 1) it adopts the random subspace technique to investigate the structure of the dataset in the subspaces and 2) a progressive training set generation process and a self evolutionary sample selection process are proposed to enlarge the training set. We also use a set of nonparametric tests to compare different semisupervised ensemble learning methods over multiple datasets. The experimental results on 18 real-world datasets from the University of California, Irvine machine learning repository show that PSEMISEL works well on most of the real-world datasets, and outperforms other state-of-the-art approaches on 10 out of 18 datasets.
半监督学习方法通常被用于处理只有少量标记样本的数据集。然而,传统的半监督集成学习方法存在两个局限性:1)它们中的大多数方法在标签数量有限的高维数据集中无法得到满意的结果;2)它们通常不考虑如何利用优化过程来扩大训练集。在本文中,我们提出了渐进式半监督集成学习方法(PSEMISEL)来解决上述限制,并处理只有少量标记样本的数据集。与传统的半监督集成学习方法相比,PSEMISEL 具有两个特点:1)它采用随机子空间技术在子空间中研究数据集的结构;2)提出了渐进式训练集生成过程和自进化样本选择过程来扩大训练集。我们还使用了一组非参数检验来比较不同的半监督集成学习方法在多个数据集上的表现。在来自加利福尼亚大学欧文分校机器学习库的 18 个真实世界数据集上的实验结果表明,PSEMISEL 在大多数真实世界数据集上表现良好,在 18 个数据集中有 10 个数据集上的表现优于其他最先进的方法。