Rigdon Joseph, Baiocchi Michael, Basu Sanjay
Stanford University.
J Stat Softw. 2018;86(CS-5). doi: 10.18637/jss.v086.c05. Epub 2018 Sep 3.
Estimating the causal treatment effect of an intervention using observational data is difficult due to unmeasured confounders. Many analysts use instrumental variables (IVs) to introduce a randomizing element to observational data analysis, potentially reducing bias created by unobserved confounders. Several persistent problems in the field have served as limitations to IV analyses, particularly the prevalence of "weak" IVs, or instrumental variables that do not effectively randomize individuals to the intervention or control group (leading to biased and unstable treatment effect estimates), as well as IV-based estimates being highly model dependent, requiring parametric adjustment for measured confounders, and often having high mean squared errors in the estimated causal effects. To overcome these problems, the study design method of "near-far matching" has been devised, which "filters" data from a cohort by simultaneously matching individuals within the cohort to be "near" (similar) on measured confounders and "far" (different) on levels of an IV. To facilitate the application of near-far matching to analytical problems, we introduce the R package and illustrate its application to both a classical example and a simulated dataset. We illustrate how the package can be used to "strengthen" a weak IV by adjusting the "near-ness" and "far-ness" of a match, reduce model dependency, enable nonparametric adjustment for measured confounders, and lower mean squared error in estimated causal effects. We additionally illustrate how to utilize the package when analyzing either continuous or binary treatments, how to prioritize variables in the match, and how to calculate statistics of IV strength with or without adjustment for measured confounders.
由于存在未测量的混杂因素,使用观察性数据估计干预措施的因果治疗效果很困难。许多分析人员使用工具变量(IVs)为观察性数据分析引入随机化元素,有可能减少未观察到的混杂因素所造成的偏差。该领域中存在的几个持续性问题一直是IV分析的局限性,特别是“弱”IVs的普遍存在,即那些不能有效地将个体随机分配到干预组或对照组的工具变量(导致有偏差且不稳定的治疗效果估计),以及基于IV的估计高度依赖模型,需要对测量到的混杂因素进行参数调整,并且在估计的因果效应中往往具有较高的均方误差。为了克服这些问题,人们设计了“近远匹配”的研究设计方法,该方法通过同时使队列中的个体在测量到的混杂因素上“接近”(相似)而在IV水平上“远离”(不同)来“过滤”队列中的数据。为了便于将近远匹配应用于分析问题,我们引入了R包,并说明其在一个经典例子和一个模拟数据集上的应用。我们说明了如何通过调整匹配的“接近度”和“远离度”来使用该包“强化”弱IV,减少模型依赖性,对测量到的混杂因素进行非参数调整,并降低估计因果效应中的均方误差。我们还说明了在分析连续或二元治疗时如何使用该包,如何在匹配中对变量进行优先级排序,以及如何在有或没有对测量到的混杂因素进行调整的情况下计算IV强度的统计量。