IEEE Trans Neural Netw Learn Syst. 2016 May;27(5):952-65. doi: 10.1109/TNNLS.2015.2430821.
Among a number of ensemble learning techniques, boosting and bagging are the most popular sampling-based ensemble approaches for classification problems. Boosting is considered stronger than bagging on noise-free data set with complex class structures, whereas bagging is more robust than boosting in cases where noise data are present. In this paper, we extend both ensemble approaches to clustering tasks, and propose a novel hybrid sampling-based clustering ensemble by combining the strengths of boosting and bagging. In our approach, the input partitions are iteratively generated via a hybrid process inspired by both boosting and bagging. Then, a novel consensus function is proposed to encode the local and global cluster structure of input partitions into a single representation, and applies a single clustering algorithm to such representation to obtain the consolidated consensus partition. Our approach has been evaluated on 2-D-synthetic data, collection of benchmarks, and real-world facial recognition data sets, which show that the proposed technique outperforms the existing benchmarks for a variety of clustering tasks.
在许多集成学习技术中,boosting 和 bagging 是用于分类问题的两种最流行的基于抽样的集成方法。在无噪声、类结构复杂的数据集上,boosting 被认为比 bagging 更强大,而在存在噪声数据的情况下,bagging 比 boosting 更稳健。在本文中,我们将这两种集成方法扩展到聚类任务中,并通过结合 boosting 和 bagging 的优势,提出了一种新的基于抽样的混合聚类集成方法。在我们的方法中,输入分区通过一种受 boosting 和 bagging 启发的混合过程迭代生成。然后,我们提出了一种新的共识函数,将输入分区的局部和全局聚类结构编码为单个表示,并将单个聚类算法应用于该表示,以获得一致的共识分区。我们的方法已经在二维合成数据、基准数据集和真实的人脸识别数据集上进行了评估,结果表明,该方法在各种聚类任务中优于现有的基准。