Groll A, Tutz G
Department of Statistics, University of Munich, Akademiestrasse 1, 80799 Munich, Germany.
Methods Inf Med. 2012;51(2):168-77. doi: 10.3414/ME11-02-0021. Epub 2012 Mar 1.
With the emergence of semi- and nonparametric regression the generalized linear mixed model has been extended to account for additive predictors. However, available fitting methods fail in high dimensional settings where many explanatory variables are present. We extend the concept of boosting to generalized additive mixed models and present an appropriate algorithm that uses two different approaches for the fitting procedure of the variance components of the random effects.
The main tool developed is likelihood-based componentwise boosting that enforces variable selection in generalized additive mixed models. In contrast to common procedures they can be used in high-dimensional settings where many covariates are available and the form of the influence is unknown. The complexity of the resulting estimators is determined by information criteria. The performance of the methods is investigated in simulation studies for binary and Poisson responses with comparisons to alternative approaches and it is applied to clinical real world data.
Simulations show that the proposed methods are considerably more stable and more accurate in estimating the regression function than the conventional approach, especially when a large number of predictors is available. The methods also produce reasonable results in applications to real data sets, which is illustrated by the Multicenter AIDS Cohort Study.
The boosting algorithm allows to extract relevant predictors in generalized additive mixed models. It works in high-dimensional settings and is very stable.
随着半参数和非参数回归的出现,广义线性混合模型已得到扩展以纳入加性预测变量。然而,现有的拟合方法在存在许多解释变量的高维情形下失效。我们将提升的概念扩展到广义加性混合模型,并提出一种合适的算法,该算法在随机效应方差分量的拟合过程中使用两种不同方法。
所开发的主要工具是基于似然的分量式提升,它在广义加性混合模型中强制进行变量选择。与常见方法不同,它们可用于有许多协变量且影响形式未知的高维情形。所得估计量的复杂度由信息准则确定。在针对二元和泊松响应的模拟研究中,将这些方法的性能与其他方法进行比较,并将其应用于临床实际数据。
模拟表明,与传统方法相比,所提出的方法在估计回归函数时要稳定得多且准确得多,尤其是在有大量预测变量的情况下。这些方法在应用于实际数据集时也产生了合理的结果,多中心艾滋病队列研究对此进行了说明。
提升算法能够在广义加性混合模型中提取相关预测变量。它适用于高维情形且非常稳定。