Division of Biostatistics, Berkeley School of Public Health, University of California, Berkeley, CA 94720, USA.
Epidemiology. 2010 Jul;21(4):467-74. doi: 10.1097/EDE.0b013e3181caeb90.
Two modeling approaches are commonly used to estimate the associations between neighborhood characteristics and individual-level health outcomes in multilevel studies (subjects within neighborhoods). Random effects models (or mixed models) use maximum likelihood estimation. Population average models typically use a generalized estimating equation (GEE) approach. These methods are used in place of basic regression approaches because the health of residents in the same neighborhood may be correlated, thus violating independence assumptions made by traditional regression procedures. This violation is particularly relevant to estimates of the variability of estimates. Though the literature appears to favor the mixed-model approach, little theoretical guidance has been offered to justify this choice. In this paper, we review the assumptions behind the estimates and inference provided by these 2 approaches. We propose a perspective that treats regression models for what they are in most circumstances: reasonable approximations of some true underlying relationship. We argue in general that mixed models involve unverifiable assumptions on the data-generating distribution, which lead to potentially misleading estimates and biased inference. We conclude that the estimation-equation approach of population average models provides a more useful approximation of the truth.
两种建模方法常用于在多层次研究(即邻里内的个体)中估计邻里特征与个体健康结果之间的关联。随机效应模型(或混合模型)使用最大似然估计。总体平均模型通常使用广义估计方程(GEE)方法。这些方法替代了基本回归方法,因为同一邻里的居民的健康可能相关,从而违反了传统回归过程的独立性假设。这种违反对于估计值的变异性的估计特别重要。尽管文献似乎倾向于混合模型方法,但几乎没有提供理论指导来证明这种选择的合理性。在本文中,我们回顾了这两种方法提供的估计和推断背后的假设。我们提出了一种观点,即从大多数情况下看待回归模型:它们是对某些真实基础关系的合理近似。我们认为,混合模型通常对数据生成分布存在不可验证的假设,这会导致潜在的误导性估计和有偏推断。我们得出结论,总体平均模型的估计方程方法提供了对真相更有用的近似。