University of Maryland.
Psychol Methods. 2014 Dec;19(4):552-63. doi: 10.1037/met0000024. Epub 2014 Aug 11.
Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of population-averaged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.
最近的研究调查了聚类数据模型(如多层次模型和广义估计方程)的小样本特性。这些研究集中在聚类数量较小时的参数偏差上,但很少有研究涉及稀疏数据的方法特性:每个聚类中的观测值数量较少。特别是,对于广义估计方程的特性,还没有研究,广义估计方程是一种可能替代行为科学中经常被忽视的多层次模型的方法,它适用于稀疏数据。本文首先讨论了总体平均和聚类特定模型,简要概述了多层次模型和广义估计方程,然后对广义估计方程、多层次模型和单水平回归模型在正态和二项结果的稀疏数据特性进行了模拟研究。模拟结果发现,广义估计方程在每个聚类有 2 个观测值的情况下,可以无偏地估计回归系数及其标准误差,只要聚类数量足够大。与之前的研究类似,当聚类大小低于约 5 时,多层次模型往往会高估聚类间方差分量。