Heckman J J, Ichimura H, Smith J, Todd P
Department of Economics, University of Chicago, IL 60637, USA.
Proc Natl Acad Sci U S A. 1996 Nov 12;93(23):13416-20. doi: 10.1073/pnas.93.23.13416.
This paper decomposes the conventional measure of selection bias in observational studies into three components. The first two components are due to differences in the distributions of characteristics between participant and nonparticipant (comparison) group members: the first arises from differences in the supports, and the second from differences in densities over the region of common support. The third component arises from selection bias precisely defined. Using data from a recent social experiment, we find that the component due to selection bias, precisely defined, is smaller than the first two components. However, selection bias still represents a substantial fraction of the experimental impact estimate. The empirical performance of matching methods of program evaluation is also examined. We find that matching based on the propensity score eliminates some but not all of the measured selection bias, with the remaining bias still a substantial fraction of the estimated impact. We find that the support of the distribution of propensity scores for the comparison group is typically only a small portion of the support for the participant group. For values outside the common support, it is impossible to reliably estimate the effect of program participation using matching methods. If the impact of participation depends on the propensity score, as we find in our data, the failure of the common support condition severely limits matching compared with random assignment as an evaluation estimator.
本文将观察性研究中选择偏倚的传统度量分解为三个组成部分。前两个组成部分源于参与者组和非参与者(对照)组成员之间特征分布的差异:第一个源于支持集的差异,第二个源于共同支持区域上密度的差异。第三个组成部分源于精确定义的选择偏倚。利用最近一项社会实验的数据,我们发现精确定义的选择偏倚所导致的组成部分小于前两个组成部分。然而,选择偏倚在实验影响估计中仍占相当大的比例。我们还考察了项目评估匹配方法的实证表现。我们发现,基于倾向得分进行匹配消除了部分但并非全部测量到的选择偏倚,剩余偏倚在估计影响中仍占相当大的比例。我们发现,对照组倾向得分分布的支持集通常只是参与者组支持集的一小部分。对于共同支持集之外的值,使用匹配方法无法可靠地估计项目参与的效果。正如我们在数据中发现的那样,如果参与的影响取决于倾向得分,那么与随机分配作为评估估计量相比,共同支持条件不成立会严重限制匹配方法的应用。