Rehm J, Arminger G, Kohlmeier L
Swiss Institute for the Prevention of Alcohol and Drug Problems, Lausanne.
Stat Med. 1992 Jun 30;11(9):1195-208. doi: 10.1002/sim.4780110906.
Omitted variable bias is discussed in the context of linear models. It is shown that the effect of omitted variables can be controlled in linear models for metric dependent variables by using data from follow-up studies. Two different models for analysing such data are proposed. In the first model the omitted variables are assumed to be uncorrelated with the explanatory variables in the model and to be constant over time. These assumptions lead to a special structure of the covariance matrix of the errors over time. Efficient estimation of the parameters in the linear model has to take this special covariance matrix of the errors into account by using appropriate generalized least squares or maximum likelihood methods. In the second model the omitted variables are assumed to be time constant. Additionally, they are allowed to be correlated with the explanatory variables, that is these variables are omitted confounders in the usual epidemiological sense. It is shown that even in this case the parameters of the linear model can be estimated consistently with ordinary least squares if a follow-up study is available. The differences between the parameter estimates under the first and the second model may be used to construct a Hausman test for misspecification. The models, the estimation methods and the Hausman test are illustrated by the example that explores the determinants of serum cholesterol in German adoloscents of both sexes.
遗漏变量偏差在线性模型的背景下进行讨论。结果表明,对于度量因变量的线性模型,通过使用随访研究的数据,可以控制遗漏变量的影响。提出了两种分析此类数据的不同模型。在第一个模型中,假定遗漏变量与模型中的解释变量不相关且随时间保持不变。这些假设导致误差协方差矩阵随时间具有特殊结构。线性模型中参数的有效估计必须通过使用适当的广义最小二乘法或最大似然法来考虑这种误差的特殊协方差矩阵。在第二个模型中,假定遗漏变量是时间常数。此外,允许它们与解释变量相关,也就是说,这些变量在通常的流行病学意义上是遗漏的混杂因素。结果表明,即使在这种情况下,如果有随访研究,线性模型的参数也可以用普通最小二乘法进行一致估计。第一个模型和第二个模型下参数估计值之间的差异可用于构建一个用于检验模型设定错误的豪斯曼检验。通过一个探索德国青少年血清胆固醇决定因素的例子来说明这些模型、估计方法和豪斯曼检验。