Ishwaran Hemant, Malley James D
Division of Biostatistics, University of Miami, 1120 NW 14th Street, Miami, 33136 FL USA.
Center for Information Technology, National Institutes of Health, Bethesda, 20892 MD USA.
BioData Min. 2014 Dec 18;7(1):28. doi: 10.1186/s13040-014-0028-y. eCollection 2014.
Using a collection of different terminal nodesize constructed random forests, each generating a synthetic feature, a synthetic random forest is defined as a kind of hyperforest, calculated using the new input synthetic features, along with the original features.
Using a large collection of regression and multiclass datasets we show that synthetic random forests outperforms both conventional random forests and the optimized forest from the regresssion portfolio.
Synthetic forests removes the need for tuning random forests with no additional effort on the part of the researcher. Importantly, the synthetic forest does this with evidently no loss in prediction compared to a well-optimized single random forest.
使用一系列不同终端节点大小构建的随机森林,每个随机森林生成一个合成特征,合成随机森林被定义为一种超森林,它使用新输入的合成特征以及原始特征进行计算。
使用大量回归和多类数据集,我们表明合成随机森林的性能优于传统随机森林和回归组合中的优化森林。
合成森林无需研究人员额外努力即可消除调整随机森林的需求。重要的是,与经过良好优化的单个随机森林相比,合成森林在预测方面显然没有损失。