超级学习者。

Super learner.

作者信息

van der Laan Mark J, Polley Eric C, Hubbard Alan E

机构信息

University of California, Berkeley, USA.

出版信息

Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

DOI:10.2202/1544-6115.1309

PMID:17910531

Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

摘要

在尝试学习一个给定一组协变量来预测结果的模型时，统计学家在其工具包中有许多估计程序。这些候选学习方法的几个例子是：最小二乘法、最小角回归、随机森林和样条回归。之前的文章（范德·拉恩和杜多伊特（2003年）；范德·拉恩等人（2006年）；西尼西等人（2007年））从理论上验证了使用交叉验证在众多候选学习方法中选择最优学习方法的有效性。受交叉验证这种用法的启发，我们提出一种新的预测方法，用于创建众多候选学习方法的加权组合以构建超级学习方法。本文提出一种在预测中构建超级学习方法的快速算法，该算法使用V折交叉验证来选择权重，以组合一组初始候选学习方法。此外，本文还对这种所谓的超级学习方法对各种真实数据生成分布的适应性进行了实际演示。这种构建超级学习方法的方法可以推广到任何可定义为损失函数最小化器的参数。