Chen Junliang, Zhang Daowen, Davidian Marie
Department of Statistics, Box 8203, North Carolina State University, Raleigh, NC 27695-8203, USA.
Biostatistics. 2002 Sep;3(3):347-60. doi: 10.1093/biostatistics/3.3.347.
A popular way to represent clustered binary, count, or other data is via the generalized linear mixed model framework, which accommodates correlation through incorporation of random effects. A standard assumption is that the random effects follow a parametric family such as the normal distribution; however, this may be unrealistic or too restrictive to represent the data. We relax this assumption and require only that the distribution of random effects belong to a class of 'smooth' densities and approximate the density by the seminonparametric (SNP) approach of Gallant and Nychka (1987). This representation allows the density to be skewed, multi-modal, fat- or thin-tailed relative to the normal and includes the normal as a special case. Because an efficient algorithm to sample from an SNP density is available, we propose a Monte Carlo EM algorithm using a rejection sampling scheme to estimate the fixed parameters of the linear predictor, variance components and the SNP density. The approach is illustrated by application to a data set and via simulation.
一种表示聚类二元、计数或其他数据的常用方法是通过广义线性混合模型框架,该框架通过纳入随机效应来处理相关性。一个标准假设是随机效应遵循参数族,如正态分布;然而,这可能不现实或限制过强而无法表示数据。我们放宽这一假设,仅要求随机效应的分布属于一类“平滑”密度,并通过加兰特和尼奇卡(1987)的半参数(SNP)方法来近似密度。这种表示允许密度相对于正态分布有偏态、多峰、厚尾或薄尾,并且将正态分布作为特殊情况包含在内。由于有从SNP密度中采样的有效算法,我们提出一种使用拒绝采样方案的蒙特卡罗期望最大化(EM)算法来估计线性预测器的固定参数、方差分量和SNP密度。通过应用于一个数据集和模拟来说明该方法。