Kneib Thomas, Hothorn Torsten, Tutz Gerhard
Institut für Statistik, Ludwig-Maximilians-Universität München, München, Germany.
Biometrics. 2009 Jun;65(2):626-34. doi: 10.1111/j.1541-0420.2008.01112.x.
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
模型选择和变量选择是实际回归分析中主要关注的问题,在许多生物统计应用中都会出现,比如栖息地适宜性分析,其目的是确定潜在的多种环境条件对特定物种的影响。我们描述了用于繁殖鸟类群落的回归模型,该模型通过一种在包含空间效应、连续协变量的非参数效应、交互曲面和可变系数的地理加性回归模型类中起作用的提升算法,来促进模型选择和变量选择。主要的建模组件是惩罚样条及其双变量张量积扩展。所有平滑模型项都表示为一个参数分量和一个具有一个自由度的平滑分量之和,以便在模型项之间进行公平比较。地理加性模型的通用表示使我们能够设计一种自动执行模型选择和变量选择的通用提升算法。