1 Department of Biostatistics, University of Liverpool, UK.
2 School of Mathematics & Statistics, Newcastle University, UK.
Stat Methods Med Res. 2019 Jan;28(1):289-308. doi: 10.1177/0962280217722382. Epub 2017 Jul 26.
Sensitivity analysis is popular in dealing with missing data problems particularly for non-ignorable missingness, where full-likelihood method cannot be adopted. It analyses how sensitively the conclusions (output) may depend on assumptions or parameters (input) about missing data, i.e. missing data mechanism. We call models with the problem of uncertainty sensitivity models. To make conventional sensitivity analysis more useful in practice we need to define some simple and interpretable statistical quantities to assess the sensitivity models and make evidence based analysis. We propose a novel approach in this paper on attempting to investigate the possibility of each missing data mechanism model assumption, by comparing the simulated datasets from various MNAR models with the observed data non-parametrically, using the K-nearest-neighbour distances. Some asymptotic theory has also been provided. A key step of this method is to plug in a plausibility evaluation system towards each sensitivity parameter, to select plausible values and reject unlikely values, instead of considering all proposed values of sensitivity parameters as in the conventional sensitivity analysis method. The method is generic and has been applied successfully to several specific models in this paper including meta-analysis model with publication bias, analysis of incomplete longitudinal data and mean estimation with non-ignorable missing data.
敏感性分析在处理缺失数据问题中很流行,特别是对于不可忽略的缺失数据,此时不能采用完全似然法。它分析了结论(输出)对缺失数据的假设或参数(输入)可能有多敏感,即缺失数据机制。我们将存在不确定性问题的模型称为敏感性模型。为了使传统的敏感性分析在实践中更有用,我们需要定义一些简单且可解释的统计量来评估敏感性模型并进行基于证据的分析。本文提出了一种新方法,通过使用 K-最近邻距离,对各种非随机缺失数据机制模型假设进行模拟数据集与观测数据之间的非参数比较,从而尝试研究每种缺失数据机制模型假设的可能性。本文还提供了一些渐近理论。该方法的一个关键步骤是为每个敏感性参数插入一个合理性评估系统,以选择合理的值并拒绝不合理的值,而不是像传统敏感性分析方法那样考虑敏感性参数的所有建议值。该方法具有通用性,并已成功应用于本文中的几个特定模型,包括带有发表偏倚的荟萃分析模型、不完全纵向数据分析和不可忽略缺失数据下的均值估计。