Ayerdi Borja, Graña Manuel
Computational Intelligence Group, UPV/EHU, Spain.
Computational Intelligence Group, UPV/EHU, Spain.
Neural Netw. 2014 Apr;52:33-42. doi: 10.1016/j.neunet.2014.01.003. Epub 2014 Jan 13.
This paper proposes the Hybrid Extreme Rotation Forest (HERF), an innovative ensemble learning algorithm for classification problems, combining classical Decision Trees with the recently proposed Extreme Learning Machines (ELM) training of Neural Networks. In the HERF algorithm, training of each individual classifier involves two steps: first computing a randomized data rotation transformation of the training data, second, training the individual classifier on the rotated data. The testing data is subjected to the same transformation as the training data, which is specific for each classifier in the ensemble. Experimental design in this paper involves (a) the comparison of factorization approaches to compute the randomized rotation matrix: the Principal Component Analysis (PCA) and the Quartimax, (b) assessing the effect of data normalization and bootstrapping training data selection, (c) all variants of single and combined ELM and decision trees, including Regularized ELM. This experimental design effectively includes other state-of-the-art ensemble approaches in the comparison, such as Voting ELM and Random Forest. We report extensive results over a collection of machine learning benchmark databases. Ranking the cross-validation results per experimental dataset and classifier tested concludes that HERF significantly improves over the other state-of-the-art ensemble classifier. Besides, we find some other results such as that the data rotation with Quartimax improves over PCA, and the relative insensitivity of the approach to regularization which may be attributable to the de facto regularization performed by the ensemble approach.
本文提出了混合极限旋转森林(HERF),这是一种用于分类问题的创新集成学习算法,它将经典决策树与最近提出的神经网络极限学习机(ELM)训练方法相结合。在HERF算法中,每个个体分类器的训练包括两个步骤:首先对训练数据进行随机数据旋转变换,其次在旋转后的数据上训练个体分类器。测试数据要进行与训练数据相同的变换,该变换针对集成中的每个分类器是特定的。本文的实验设计包括:(a)比较用于计算随机旋转矩阵的分解方法:主成分分析(PCA)和四次极大值法;(b)评估数据归一化和自助法训练数据选择的效果;(c)单ELM和组合ELM以及决策树的所有变体,包括正则化ELM。该实验设计在比较中有效地纳入了其他当前最先进的集成方法,如投票ELM和随机森林。我们在一系列机器学习基准数据库上报告了广泛的结果。对每个实验数据集和测试的分类器的交叉验证结果进行排名得出结论,HERF比其他当前最先进的集成分类器有显著改进。此外,我们还发现了一些其他结果,比如用四次极大值法进行数据旋转比PCA效果更好,以及该方法对正则化相对不敏感,这可能归因于集成方法事实上执行的正则化。