因果推断工具变量法中的缺失数据处理

Handling Missing Data in Instrumental Variable Methods for Causal Inference.

作者信息

Kennedy Edward H, Mauro Jacqueline A, Daniels Michael J, Burns Natalie, Small Dylan S

机构信息

Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, USA, 15213.

Department of Statistics, University of Florida, Gainesville, USA 32611.

出版信息

Annu Rev Stat Appl. 2019 Mar;6(1):125-148. doi: 10.1146/annurev-statistics-031017-100353. Epub 2018 Nov 28.

DOI:10.1146/annurev-statistics-031017-100353

PMID:33834080

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8025985/

Abstract

It is very common in instrumental variable studies for there to be missing instrument data. For example, in the Wisconsin Longitudinal Study one can use genotype data as a Mendelian randomization-style instrument, but this information is often missing when subjects do not contribute saliva samples, or when the genotyping platform output is ambiguous. Here we review missing-at-random assumptions one can use to identify instrumental variable causal effects, and discuss various approaches for estimation and inference. We consider likelihood-based methods, regression and weighting estimators, and doubly robust estimators. The likelihood-based methods yield the most precise inference, and are optimal under the model assumptions, while the doubly robust estimators can attain the nonparametric efficiency bound while allowing flexible nonparametric estimation of nuisance functions (e.g., instrument propensity scores). The regression and weighting estimators can sometimes be easiest to describe and implement. Our main contribution is an extensive review of this wide array of estimators under varied missing-at-random assumptions, along with discussion of asymptotic properties and inferential tools. We also implement many of the estimators in an analysis of the Wisconsin Longitudinal Study, to study effects of impaired cognitive functioning on depression.

摘要

在工具变量研究中，工具数据缺失的情况非常普遍。例如，在威斯康星纵向研究中，可以将基因型数据用作孟德尔随机化式的工具变量，但当受试者未提供唾液样本，或者基因分型平台输出结果不明确时，这些信息往往会缺失。在此，我们回顾了可用于识别工具变量因果效应的随机缺失假设，并讨论了估计和推断的各种方法。我们考虑了基于似然的方法、回归和加权估计器以及双重稳健估计器。基于似然的方法能得出最精确的推断，并且在模型假设下是最优的，而双重稳健估计器可以达到非参数效率界，同时允许对干扰函数进行灵活的非参数估计（例如，工具倾向得分）。回归和加权估计器有时描述和实施起来最为容易。我们的主要贡献是在各种随机缺失假设下，对这一系列广泛的估计器进行了全面回顾，并讨论了渐近性质和推断工具。我们还在对威斯康星纵向研究的分析中实施了许多估计器，以研究认知功能受损对抑郁症的影响。