Suppr超能文献

使用工具变量对非随机缺失数据进行半参数估计。

Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable.

作者信息

Sun BaoLuo, Liu Lan, Miao Wang, Wirth Kathleen, Robins James, Tchetgen Tchetgen Eric J

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

Beijing International Center for Mathematical Research, Peking University.

出版信息

Stat Sin. 2018 Oct;28(4):1965-1983. doi: 10.5705/ss.202016.0324.

Abstract

Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects which satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression-based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.

摘要

缺失数据在健康与社会科学的实证研究中频繁出现,常常影响我们做出准确推断的能力。如果在观测变量的条件下,缺失数据机制仍依赖于未观测到的结果,则称该结果为非随机缺失(MNAR)。在这种情况下,若不施加额外假设,通常无法进行识别。然而,如果为所有受试者观测到一个满足排除限制的工具变量(IV),即该IV影响缺失过程但不直接影响结果,那么有时是可以进行识别的。在本文中,我们给出了借助IV在MNAR情况下对完整数据分布进行非参数识别的充要条件。此外,我们还给出了在实践中更易于验证的充分识别条件。对于推断,我们专注于总体结果均值的估计,为此我们开发了一套半参数估计量,扩展了先前为随机缺失数据开发的方法。具体而言,我们提出了针对MNAR结果均值的逆概率加权估计、基于结果回归的估计和双重稳健估计。为作说明,这些方法被用于在博茨瓦纳莫丘迪评估艾滋病毒血清流行率时,利用诸如性别、年龄和工作年限等访员特征作为IV来处理因拒绝艾滋病毒检测导致的选择偏差。

相似文献

5
Handling Missing Data in Instrumental Variable Methods for Causal Inference.因果推断工具变量法中的缺失数据处理
Annu Rev Stat Appl. 2019 Mar;6(1):125-148. doi: 10.1146/annurev-statistics-031017-100353. Epub 2018 Nov 28.

引用本文的文献

4
Envelope method with ignorable missing data.带有可忽略缺失数据的包络法。
Electron J Stat. 2021;15(2):4420-4461. doi: 10.1214/21-ejs1881. Epub 2021 Sep 14.

本文引用的文献

4
On doubly robust estimation in a semiparametric odds ratio model.半参数优势比模型中的双重稳健估计
Biometrika. 2010 Mar;97(1):171-180. doi: 10.1093/biomet/asp062. Epub 2009 Dec 8.
5
On weighting approaches for missing data.关于缺失数据的加权方法。
Stat Methods Med Res. 2013 Feb;22(1):14-30. doi: 10.1177/0962280211403597. Epub 2011 Jun 24.
9
Multiple imputation: current perspectives.多重填补:当前观点
Stat Methods Med Res. 2007 Jun;16(3):199-218. doi: 10.1177/0962280206075304.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验