Cook Thomas D, Steiner Peter M, Pohl Steffi
a Institute for Policy Research, Northwestern University.
b Friedrich-Schiller-Universität , Jena , Germany.
Multivariate Behav Res. 2009 Nov 30;44(6):828-47. doi: 10.1080/00273170903333673.
This study uses within-study comparisons to assess the relative importance of covariate choice, unreliability in the measurement of these covariates, and whether regression or various forms of propensity score analysis are used to analyze the outcome data. Two of the within-study comparisons are of the four-arm type, and many more are of the three-arm type. To examine unreliability, simulations of differences in reliability are deliberately introduced into the 2 four-arm studies. Results are similar across the samples of studies reviewed with their wide range of non-experimental designs and topic areas. Covariate choice counts most, unreliability next most, and the mode of data analysis hardly matters at all. Unreliability has larger effects the more important a covariate is for bias reduction, but even so the very best covariates measured with a reliability of only .60 still do better than substantively poor covariates that are measured perfectly. Why regression methods do as well as propensity score methods used in several different ways is a mystery still because, in theory, propensity scores would seem to have a distinct advantage in many practical applications, especially those where functional forms are in doubt.
本研究采用研究内比较的方法,以评估协变量选择的相对重要性、这些协变量测量中的不可靠性,以及是否使用回归分析或各种形式的倾向得分分析来分析结果数据。研究内比较中有两项是四臂类型,更多的是三臂类型。为了检验不可靠性,在两项四臂研究中特意引入了可靠性差异的模拟。在所审查的研究样本中,尽管它们的非实验设计和主题领域范围广泛,但结果是相似的。协变量选择最为重要,不可靠性次之,而数据分析方式几乎无关紧要。对于减少偏差而言,协变量越重要,不可靠性的影响就越大,但即便如此,可靠性仅为0.60的最佳协变量仍比测量完美但本质上较差的协变量表现更好。回归方法与以几种不同方式使用的倾向得分方法效果相当,这仍是一个谜,因为从理论上讲,倾向得分在许多实际应用中似乎具有明显优势,尤其是在函数形式存疑的那些应用中。