Department of Epidemiology, University of Pittsburgh, 130 DeSoto Street 503 Parran Hall, Pittsburgh, PA, 15261, USA.
Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, USA.
Eur J Epidemiol. 2018 May;33(5):459-464. doi: 10.1007/s10654-018-0390-z. Epub 2018 Apr 10.
Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into a host of methods among which is the "Super Learner". Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.
堆叠泛化是一种集成方法,允许研究人员将几种不同的预测算法组合成一个。自 20 世纪 90 年代初引入以来,该方法已经经历了几次发展,演变成了许多方法,其中包括“超级学习者”。超级学习者使用 V 折交叉验证来构建从候选算法库中预测的最优加权组合。最优性由用户指定的目标函数定义,例如最小化均方误差或最大化接收器操作特征曲线下的面积。尽管本质上相对简单,但由于对概念和技术细节的理解有限,流行病学家对超级学习者的使用受到了阻碍。我们通过两个示例逐步说明概念并解决常见问题。