Harel Ofer, Zhou Xiao-Hua
Department of Statistics, University of Connecticut, 215 Glenbrook Road Unit 4120 Storrs, CT 06269-4120, USA.
Stat Med. 2007 Jul 20;26(16):3057-77. doi: 10.1002/sim.2787.
Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) (Multiple Imputation for Nonresponse in Surveys. Wiley: New York, 1987) is a simulation-based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 1980s we observed a constant increase in the use and publication of MI-related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software. We illustrate some of the major points using an example from an Alzheimer disease (AD) study. In this AD study, while clinical data are available for all subjects, postmortem data are only available for the subset of those who died and underwent an autopsy. Analysis of incomplete data requires making unverifiable assumptions. These assumptions are discussed in detail in the text. Relevant S-Plus code is provided.
数据缺失是数据分析中常见的复杂问题。在许多医学环境中,数据缺失会在估计、精度和推断方面造成困难。多重填补(MI)(《调查中无回答的多重填补》。威利出版社:纽约,1987年)是一种基于模拟的处理不完整数据的方法。虽然有许多不同的方法来处理不完整数据,但多重填补已成为主要方法之一。自20世纪80年代末以来,我们观察到与多重填补相关的研究在使用和发表方面持续增加。本教程并不试图涵盖与多重填补有关的所有内容,而是提供一个概述,并将多重填补背后的理论、多重填补的实施结合在一起,同时讨论使用商业软件和免费软件增加多重填补使用的可能性。我们用一项阿尔茨海默病(AD)研究中的例子来说明一些要点。在这项AD研究中,虽然所有受试者都有临床数据,但尸检数据仅适用于那些死亡并接受了尸检的受试者子集。对不完整数据的分析需要做出无法验证的假设。文中将详细讨论这些假设。同时提供了相关的S-Plus代码。