IEEE Trans Cybern. 2019 Jun;49(6):2280-2293. doi: 10.1109/TCYB.2018.2824299. Epub 2018 Apr 20.
Classification of high-dimensional data with very limited labels is a challenging task in the field of data mining and machine learning. In this paper, we propose the multiobjective semisupervised classifier ensemble (MOSSCE) approach to address this challenge. Specifically, a multiobjective subspace selection process (MOSSP) in MOSSCE is first designed to generate the optimal combination of feature subspaces. Three objective functions are then proposed for MOSSP, which include the relevance of features, the redundancy between features, and the data reconstruction error. Then, MOSSCE generates an auxiliary training set based on the sample confidence to improve the performance of the classifier ensemble. Finally, the training set, combined with the auxiliary training set, is used to select the optimal combination of basic classifiers in the ensemble, train the classifier ensemble, and generate the final result. In addition, diversity analysis of the ensemble learning process is applied, and a set of nonparametric statistical tests is adopted for the comparison of semisupervised classification approaches on multiple datasets. The experiments on 12 gene expression datasets and two large image datasets show that MOSSCE has a better performance than other state-of-the-art semisupervised classifiers on high-dimensional data.
用非常有限的标签对高维数据进行分类是数据挖掘和机器学习领域的一项具有挑战性的任务。在本文中,我们提出了多目标半监督分类器集成(MOSSCE)方法来解决这个挑战。具体来说,MOSSCE 中的多目标子空间选择过程(MOSSP)首先被设计用来生成最优的特征子空间组合。然后,我们提出了三个用于 MOSSP 的目标函数,包括特征的相关性、特征之间的冗余性和数据重构误差。然后,MOSSCE 基于样本置信度生成辅助训练集,以提高分类器集成的性能。最后,训练集与辅助训练集结合起来,用于在集成中选择最优的基本分类器组合,训练分类器集成,并生成最终结果。此外,还对集成学习过程的多样性进行了分析,并采用了一组非参数统计检验方法对多个数据集上的半监督分类方法进行了比较。在 12 个基因表达数据集和两个大型图像数据集上的实验表明,MOSSCE 在高维数据上的性能优于其他最先进的半监督分类器。