Center of Sustainable and Resilient Infrastructure, Virginia Tech Transportation Institute, United States.
Center of Sustainable and Resilient Infrastructure, Virginia Tech Transportation Institute, United States; Department of Civil and Environmental Engineering, Virginia Tech, United States.
Accid Anal Prev. 2022 Jun;170:106642. doi: 10.1016/j.aap.2022.106642. Epub 2022 Mar 25.
Omitted variable bias is one of the main factors that lead to incorrect estimates of the effect of a variable on the expected number of crashes using regression modeling. We propose to use differencing of the (spatially adjacent) variables to reduce the effect of omitted variable bias. Differencing is a linear transformation that preserves the structure of the (generalized) linear model but can often result in significantly reducing the correlation between the variables. It is special case of (generalized) partial linear model regression which itself is a special case of a generalized additive model (GAM). In the spatial context used in this paper, differencing is similar to the well-known approach of including a spatial correlation structure (spatial error term) in the analysis of crash data. It is generally not clear how to interpret the results of models that include a spatial correlation structure and whether and how the added spatial correlation structure reduces the bias in the estimated regression parameters. However, for the case of differencing, it becomes clear how the effect of omitted variable bias is reduced by reducing the correlation between the variable of interest and the omitted variables. The order of differencing determines the dominant spatial scales of the variables considered in the model which affect how much the correlation is reduced. This reveals that omitted variable bias can be reduced when there are spatial scales at which the covariate of interest varies but the omitted variables either 1) are relatively homogeneous or 2) have variations that are not correlated to those of the variable of interest.
忽略变量偏差是导致使用回归建模对变量对预期碰撞数量的影响进行不正确估计的主要因素之一。我们建议使用变量的差分(空间相邻)来减少忽略变量偏差的影响。差分是一种线性变换,保留了(广义)线性模型的结构,但通常可以显著降低变量之间的相关性。它是广义部分线性模型回归的特例,而广义部分线性模型回归本身是广义可加模型 (GAM) 的特例。在本文中使用的空间背景下,差分类似于在碰撞数据分析中包含空间相关结构(空间误差项)的知名方法。通常不清楚如何解释包含空间相关结构的模型的结果,以及添加的空间相关结构是否以及如何减少估计回归参数的偏差。然而,对于差分的情况,通过减少感兴趣变量与忽略变量之间的相关性,可以清楚地了解如何减少忽略变量偏差的影响。差分的顺序决定了模型中考虑的变量的主导空间尺度,这会影响相关性降低的程度。这表明,当存在感兴趣的协变量变化的空间尺度但忽略变量 1)相对均匀或 2)变化与感兴趣变量的变化不相关时,可以减少忽略变量偏差。