Suppr超能文献

逆概率加权法的不稳定性及对不可忽略缺失数据的补救措施。

Instability of inverse probability weighting methods and a remedy for nonignorable missing data.

机构信息

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.

National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA.

出版信息

Biometrics. 2023 Dec;79(4):3215-3226. doi: 10.1111/biom.13881. Epub 2023 May 23.

Abstract

Inverse probability weighting (IPW) methods are commonly used to analyze nonignorable missing data (NIMD) under the assumption of a logistic model for the missingness probability. However, solving IPW equations numerically may involve nonconvergence problems when the sample size is moderate and the missingness probability is high. Moreover, those equations often have multiple roots, and identifying the best root is challenging. Therefore, IPW methods may have low efficiency or even produce biased results. We identify the pitfall in these methods pathologically: they involve the estimation of a moment-generating function (MGF), and such functions are notoriously unstable in general. As a remedy, we model the outcome distribution given the covariates of the completely observed individuals semiparametrically. After forming an induced logistic regression (LR) model for the missingness status of the outcome and covariate, we develop a maximum conditional likelihood method to estimate the underlying parameters. The proposed method circumvents the estimation of an MGF and hence overcomes the instability of IPW methods. Our theoretical and simulation results show that the proposed method outperforms existing competitors greatly. Two real data examples are analyzed to illustrate the advantages of our method. We conclude that if only a parametric LR is assumed but the outcome regression model is left arbitrary, then one has to be cautious in using any of the existing statistical methods in problems involving NIMD.

摘要

逆概率加权(Inverse probability weighting,简称 IPW)方法常用于在缺失概率的逻辑模型假设下分析不可忽略的缺失数据(Nonignorable missing data,简称 NIMD)。然而,当样本量适中且缺失概率较高时,通过数值求解 IPW 方程可能会遇到不收敛的问题。此外,这些方程通常有多个根,确定最佳根具有挑战性。因此,IPW 方法可能效率低下,甚至产生有偏的结果。我们从病理学角度发现了这些方法的缺陷:它们涉及矩生成函数(Moment-generating function,简称 MGF)的估计,而一般来说,这些函数非常不稳定。作为一种补救措施,我们对半参数地对完全观测个体的协变量进行建模,以预测结果的分布。在形成用于预测结果和协变量缺失状态的诱导逻辑回归(Induced logistic regression,简称 LR)模型之后,我们开发了一种最大条件似然方法来估计潜在参数。所提出的方法避免了 MGF 的估计,从而克服了 IPW 方法的不稳定性。我们的理论和模拟结果表明,所提出的方法大大优于现有的竞争方法。通过分析两个真实数据示例来说明我们方法的优势。我们的结论是,如果仅假设参数 LR,但不考虑结果回归模型,那么在涉及 NIMD 的问题中使用任何现有的统计方法都需要谨慎。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验