Rodríguez Juan J, Kuncheva Ludmila I, Alonso Carlos J
Escuela Politécnica Superior, Edificio C, Universidad de Burgos, c/ Francisco de Vitoria s/n, 09006 Burgos, Spain.
IEEE Trans Pattern Anal Mach Intell. 2006 Oct;28(10):1619-30. doi: 10.1109/TPAMI.2006.211.
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.
我们提出了一种基于特征提取生成分类器集成的方法。为了为基础分类器创建训练数据,将特征集随机划分为K个子集(K是该算法的一个参数),并对每个子集应用主成分分析(PCA)。保留所有主成分以保留数据中的变异性信息。因此,进行K次轴旋转以形成基础分类器的新特征。旋转方法的理念是同时提高集成内个体的准确性和多样性。通过为每个基础分类器进行特征提取来促进多样性。这里选择决策树是因为它们对特征轴的旋转敏感,因此得名“森林”。通过保留所有主成分并使用整个数据集来训练每个基础分类器来追求准确性。使用WEKA,我们在从UCI存储库中随机选择的33个基准数据集上检验了旋转森林集成,并将其与装袋法、AdaBoost和随机森林进行了比较。结果对旋转森林有利,并促使我们对集成模型的多样性-准确性格局进行研究。多样性-误差图显示,旋转森林集成构建的个体分类器比AdaBoost和随机森林中的更准确,比装袋法中的更多样化,有时也更准确。