Hocagil Tugba Akkaya, Cook Richard J, Jacobson Sandra W, Jacobson Joseph L, Ryan Louise M
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada.
Department of Psychiatry and Behavioral Neurosciences, Wayne State University, Detroit, USA.
J R Stat Soc Ser A Stat Soc. 2021 Oct;184(4):1390-1413. doi: 10.1111/rssa.12716. Epub 2021 Jul 5.
Propensity score methodology has become increasingly popular in recent years as a tool for estimating causal effects in observational studies. Much of the related research has been directed at settings with binary or discrete exposure variables with more recent work involving continuous exposure variables. In environmental epidemiology, a substantial proportion of individuals is often completely unexposed while others may experience heavy exposure leading to an exposure distribution with a point mass at zero and a heavy right tail. We suggest a new approach to handle this type of exposure data by constructing a propensity score based on a two-part model and show how this model can be used to more reliably adjust for covariates of a semi-continuous exposure variable. We also consider the case when a misspecified propensity score is used in a regression adjustment and derive an explicit form of the bias. We show that the potential bias gets smaller as the estimated propensity score gets closer to the true expectation of the exposure variable given a set of observed covariates. While this result pertains to a more general setting, we use it to evaluate the potential bias in settings in which the true exposure has a semi-continuous structure. We also evaluate and compare the performance of our proposed method through simulation studies relative to a simpler linear regression-based propensity score for a continuous exposure variable as well as through direct covariate adjustment. Overall, we find that using a propensity score constructed via a two-part model significantly improves the regression estimate when the exposure variable is semi-continuous in nature. Specifically when the proportion of non-exposed subjects is high and the effects of covariates on exposure and outcome are strong, the proposed two-part propensity score method outperforms the more standard competing methods. We illustrate our method using data from the Detroit Longitudinal Cohort Study in which the exposure variable reflects gestational alcohol exposure featuring zero values and a long tail.
近年来,倾向得分方法作为一种在观察性研究中估计因果效应的工具越来越受欢迎。许多相关研究都针对具有二元或离散暴露变量的情况,而最近的工作涉及连续暴露变量。在环境流行病学中,很大一部分个体通常完全未暴露,而其他个体可能经历高暴露,导致暴露分布在零处有一个点质量且右尾较重。我们建议一种新方法来处理这类暴露数据,即基于两部分模型构建倾向得分,并展示该模型如何用于更可靠地调整半连续暴露变量的协变量。我们还考虑了在回归调整中使用错误设定的倾向得分的情况,并推导偏差的显式形式。我们表明,随着估计的倾向得分接近给定一组观察到的协变量时暴露变量的真实期望,潜在偏差会变小。虽然这个结果适用于更一般的情况,但我们用它来评估真实暴露具有半连续结构的情况下的潜在偏差。我们还通过模拟研究评估和比较我们提出的方法相对于用于连续暴露变量的基于简单线性回归的倾向得分以及直接协变量调整的性能。总体而言,我们发现当暴露变量本质上是半连续时,使用通过两部分模型构建的倾向得分可显著改善回归估计。具体而言,当未暴露受试者的比例较高且协变量对暴露和结果的影响较强时,所提出的两部分倾向得分方法优于更标准的竞争方法。我们使用底特律纵向队列研究的数据来说明我们的方法,其中暴露变量反映了孕期酒精暴露,具有零值和长尾特征。