Bien Jacob, Taylor Jonathan, Tibshirani Robert
Cornell University, Stanford University and Stanford University.
Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.
We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between -the number of nonzero coefficients-and -the number of raw variables one must to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.
我们在套索回归中添加一组凸约束,以生成稀疏交互模型,该模型遵循层次结构限制,即只有当一个或两个变量在边际上重要时,交互项才会被包含在模型中。我们精确刻画了这种层次约束的效果,证明层次结构以概率1成立,并推导出我们估计量自由度的无偏估计。该估计的一个界揭示了层次约束“节省”的拟合量。我们区分了非零系数的数量和进行新预测所需的原始变量的数量。层次结构关注后者,这与成本、时间和精力等重要的数据收集问题联系更紧密。我们开发了一种算法(可在R包hierNet中获取),并对我们的方法进行了实证研究。