Nethery Rachel C, Chen Jarvis T, Krieger Nancy, Waterman Pamela D, Peterson Emily, Waller Lance A, Coull Brent A
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Am Stat. 2022;76(2):142-151. doi: 10.1080/00031305.2021.2003245. Epub 2022 Jan 4.
Health inequities are assessed by health departments to identify social groups disproportionately burdened by disease and by academic researchers to understand how social, economic, and environmental inequities manifest as health inequities. To characterize inequities, group-specific small-area health data are often modeled using log-linear generalized linear models (GLM) or generalized linear mixed models (GLMM) with a random intercept. These approaches estimate the same marginal rate ratio comparing disease rates across groups under standard assumptions. Here we explore how residential segregation combined with social group differences in disease risk can lead to contradictory findings from the GLM and GLMM. We show that this occurs because small-area disease rate data collected under these conditions induce endogeneity in the GLMM due to correlation between the model's offset and random effect. This results in GLMM estimates that represent conditional rather than marginal associations. We refer to endogeneity arising from the offset, which to our knowledge has not been noted previously, as "offset endogeneity". We illustrate this phenomenon in simulated data and real premature mortality data, and we propose alternative modeling approaches to address it. We also introduce to a statistical audience the social epidemiologic terminology for framing health inequities, which enables responsible interpretation of results.
卫生部门通过评估健康不平等状况来识别疾病负担过重的社会群体,学术研究人员则通过评估来了解社会、经济和环境不平等如何表现为健康不平等。为了描述不平等状况,特定群体的小区域健康数据通常使用对数线性广义线性模型(GLM)或具有随机截距的广义线性混合模型(GLMM)进行建模。在标准假设下,这些方法估计的是比较不同群体疾病发生率的相同边际率比。在此,我们探讨居住隔离与疾病风险方面的社会群体差异相结合,如何导致GLM和GLMM得出相互矛盾的结果。我们表明,出现这种情况是因为在这些条件下收集的小区域疾病发生率数据,由于模型的偏移量与随机效应之间的相关性,在GLMM中会引发内生性。这导致GLMM估计值代表的是条件关联而非边际关联。我们将这种由偏移量引发的内生性(据我们所知此前尚未被提及)称为“偏移内生性”。我们在模拟数据和实际过早死亡率数据中说明了这一现象,并提出了替代建模方法来解决它。我们还向统计学受众介绍了用于构建健康不平等框架的社会流行病学术语,这有助于对结果进行合理的解读。