Zhao Jiwei
Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY, USA.
J Nonparametr Stat. 2017;29(3):577-593. doi: 10.1080/10485252.2017.1339306. Epub 2017 Jun 14.
In missing data analysis, the assumption of the missing data mechanism is crucial. Under different assumptions, different statistical methods have to be developed accordingly; however, in reality this kind of assumption is usually unverifiable. Therefore a less stringent, and hence more flexible, assumption is preferred. In this paper, we consider a generally applicable missing data mechanism, which includes various instances in all three scenarios: missing completely at random, missing at random, and missing not at random. Under this general missing data mechanism, we introduce the conditional likelihood and its approximate version as the base for estimating the unknown parameter of interest. Since this approximate conditional likelihood uses the completely observed samples only, it may result in large estimation bias, which could deteriorate the statistical inference and also jeopardize other statistical procedure. To tackle this problem, we propose to use some resampling techniques to reduce the estimation bias. We consider both the Jackknife and the Bootstrap in our paper. We compare their asymptotic biases through a higher order expansion up to ( ). We also derive some results for the mean squared error in terms of estimation accuracy. We conduct comprehensive simulation studies under different situations to illustrate our proposed method. We also apply our method to a prostate cancer data analysis.
在缺失数据分析中,缺失数据机制的假设至关重要。在不同假设下,必须相应地开发不同的统计方法;然而,在现实中这种假设通常无法验证。因此,更宽松、从而更灵活的假设更受青睐。在本文中,我们考虑一种普遍适用的缺失数据机制,它涵盖了所有三种情形下的各种情况:完全随机缺失、随机缺失和非随机缺失。在这种一般的缺失数据机制下,我们引入条件似然及其近似形式作为估计感兴趣的未知参数的基础。由于这种近似条件似然仅使用完全观测到的样本,可能会导致较大的估计偏差,这可能会使统计推断恶化,也会危及其他统计程序。为了解决这个问题,我们建议使用一些重采样技术来减少估计偏差。我们在本文中考虑了刀切法和自助法。我们通过高达( )的高阶展开来比较它们的渐近偏差。我们还根据估计精度得出了一些关于均方误差的结果。我们在不同情况下进行了全面的模拟研究以说明我们提出的方法。我们还将我们的方法应用于前列腺癌数据分析。