通过删除完全观测记录来处理缺失数据。

HANDLING MISSING DATA BY DELETING COMPLETELY OBSERVED RECORDS.

作者信息

Paik Myunghee Cho, Wang Cuiling

机构信息

Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168 Street, New York City, N.Y. 10032, U.S.A.

出版信息

J Stat Plan Inference. 2009 Jul 1;139(7):2341-2350. doi: 10.1016/j.jspi.2008.10.024.

DOI:10.1016/j.jspi.2008.10.024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2674251/

Abstract

When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao and Lipsitz, 1992). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when when the missingness proportion is large.

摘要

当数据缺失时，仅分析完全观测到的记录可能会导致偏差或效率低下。处理缺失数据的现有方法包括似然法、插补法和逆概率加权法。在本文中，我们提出了三种估计方法，其灵感来源于在回归设置中删除一些完全观测到的数据。首先，我们生成与给定观测数据的结果无关的人工观测指标，并基于这些人工观测指标进行推断。其次，我们提出了一种密切相关的加权方法。所提出的加权方法比逆概率加权法（Zhao和Lipsitz，1992）的权重更稳定。第三，我们通过从估计函数中减去其在干扰切空间上的投影来提高所提出的加权估计量的效率。当数据完全随机缺失时，我们证明所提出的估计量的渐近方差小于或等于仅使用完全观测记录得到的估计量的方差。渐近相对效率计算和模拟研究表明，在所提出的加权估计量在广泛的实际情况下比逆概率加权估计量更有效，特别是当缺失比例较大时。

相似文献

1

HANDLING MISSING DATA BY DELETING COMPLETELY OBSERVED RECORDS.通过删除完全观测记录来处理缺失数据。

J Stat Plan Inference. 2009 Jul 1;139(7):2341-2350. doi: 10.1016/j.jspi.2008.10.024.

2

Comparison between inverse-probability weighting and multiple imputation in Cox model with missing failure subtype.缺失失效亚组的 Cox 模型中逆概率加权与多重插补的比较

Stat Methods Med Res. 2024 Feb;33(2):344-356. doi: 10.1177/09622802231226328. Epub 2024 Jan 23.

3

Best linear inverse probability weighted estimation for two-phase designs and missing covariate regression.两阶段设计和缺失协变量回归的最佳线性逆概率加权估计。

Stat Med. 2019 Jul 10;38(15):2783-2796. doi: 10.1002/sim.8141. Epub 2019 Mar 25.

4

Smoothed Rank Regression for the Accelerated Failure Time Competing Risks Model with Missing Cause of Failure.具有缺失失效原因的加速失效时间竞争风险模型的平滑秩回归

Stat Sin. 2019 Jan;29(1):23-46. doi: 10.5705/ss.202016.0231.

5

Propensity score analysis with partially observed covariates: How should multiple imputation be used?倾向评分分析与部分观测协变量：应如何使用多重插补？

Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.

6

Accounting for nonmonotone missing data using inverse probability weighting.使用逆概率加权法处理非单调缺失数据。

Stat Med. 2023 Oct 15;42(23):4282-4298. doi: 10.1002/sim.9860. Epub 2023 Jul 31.

7

Adjusting for selection bias due to missing data in electronic health records-based research.调整电子健康记录研究中因数据缺失导致的选择偏差。

Stat Methods Med Res. 2021 Oct;30(10):2221-2238. doi: 10.1177/09622802211027601. Epub 2021 Aug 26.

8

Robust best linear weighted estimator with missing covariates in survival analysis.生存分析中具有缺失协变量的鲁棒最佳线性加权估计量。

Stat Med. 2024 Apr 30;43(9):1790-1803. doi: 10.1002/sim.10044. Epub 2024 Feb 25.

9

Missing data approaches for probability regression models with missing outcomes with applications.针对具有缺失结果的概率回归模型的缺失数据处理方法及其应用

J Stat Distrib Appl. 2014;1. doi: 10.1186/s40488-014-0023-3. Epub 2014 Dec 16.

10

Estimating causal effects for binary outcomes using per-decision inverse probability weighting.使用逐决策逆概率加权法估计二元结局的因果效应。

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae025.

本文引用的文献

1

Likelihood methods for incomplete longitudinal binary responses with incomplete categorical covariates.针对具有不完全分类协变量的不完全纵向二元反应的似然方法。

Biometrics. 1999 Mar;55(1):214-23. doi: 10.1111/j.0006-341x.1999.00214.x.

2

Estimating equations with nonignorably missing response data.用于处理响应数据存在非忽略性缺失情况的估计方程。

Biometrics. 1999 Sep;55(3):984-9. doi: 10.1111/j.0006-341x.1999.00984.x.

3

The relationship between hot-deck multiple imputation and weighted likelihood.热插补多重填补与加权似然之间的关系。

Stat Med. 1997;16(1-3):5-19. doi: 10.1002/(sici)1097-0258(19970115)16:1<5::aid-sim469>3.0.co;2-8.

4

Regression analysis with missing covariate data using estimating equations.使用估计方程对缺失协变量数据进行回归分析。

Biometrics. 1996 Dec;52(4):1165-82.

5

Missing data in longitudinal studies.纵向研究中的缺失数据。

Stat Med. 1988 Jan-Feb;7(1-2):305-15. doi: 10.1002/sim.4780070131.

6

Designs and analysis of two-stage studies.两阶段研究的设计与分析。

Stat Med. 1992 Apr;11(6):769-82. doi: 10.1002/sim.4780110608.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验