Zhou Zhi-Hua, Yu Yang
National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China.
IEEE Trans Syst Man Cybern B Cybern. 2005 Aug;35(4):725-35. doi: 10.1109/tsmcb.2005.845396.
Ensemble learning algorithms train multiple component learners and then combine their predictions. In order to generate a strong ensemble, the component learners should be with high accuracy as well as high diversity. A popularly used scheme in generating accurate but diverse component learners is to perturb the training data with resampling methods, such as the bootstrap sampling used in bagging. However, such a scheme is not very effective on local learners such as nearest-neighbor classifiers because a slight change in training data can hardly result in local learners with big differences. In this paper, a new ensemble algorithm named Filtered Attribute Subspace based Bagging with Injected Randomness (FASBIR) is proposed for building ensembles of local learners, which utilizes multimodal perturbation to help generate accurate but diverse component learners. In detail, FASBIR employs the perturbation on the training data with bootstrap sampling, the perturbation on the input attributes with attribute filtering and attribute subspace selection, and the perturbation on the learning parameters with randomly configured distance metrics. A large empirical study shows that FASBIR is effective in building ensembles of nearest-neighbor classifiers, whose performance is better than that of many other ensemble algorithms.
集成学习算法训练多个组件学习器,然后组合它们的预测结果。为了生成一个强大的集成,组件学习器应该具有高精度和高多样性。在生成准确但多样的组件学习器时,一种常用的方案是使用重采样方法对训练数据进行扰动,例如在装袋法中使用的自助采样。然而,这种方案对局部学习器(如最近邻分类器)不是很有效,因为训练数据的微小变化很难导致局部学习器有很大差异。本文提出了一种名为基于注入随机性的过滤属性子空间装袋法(FASBIR)的新集成算法,用于构建局部学习器的集成,该算法利用多模态扰动来帮助生成准确但多样的组件学习器。具体来说,FASBIR对训练数据采用自助采样进行扰动,对输入属性采用属性过滤和属性子空间选择进行扰动,对学习参数采用随机配置的距离度量进行扰动。大量实证研究表明,FASBIR在构建最近邻分类器的集成方面是有效的,其性能优于许多其他集成算法。