在我的研究中，我该如何处理缺失数据？

How can I deal with missing data in my study?

作者信息

Bennett D A

机构信息

Department of Medicine, University of Auckland, New Zealand.

出版信息

Aust N Z J Public Health. 2001 Oct;25(5):464-9.

PMID:11688629

Abstract

Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account the results of the statistical analyses will be biased and the amount of variability in the data will not be correctly estimated. There are three main types of missing data pattern: Missing Completely At Random (MCAR), Missing At Random (MAR) and Not Missing At Random (NMAR). The type of missing data that a researcher has in their dataset determines the appropriate method to use in handling the missing data before a formal statistical analysis begins. The aim of this practice note is to describe these patterns of missing data and how they can occur, as well describing the methods of handling them. Simple and more complex methods are described, including the advantages and disadvantages of each method as well as their availability in routine software. It is good practice to perform a sensitivity analysis employing different missing data techniques in order to assess the robustness of the conclusions drawn from each approach.

摘要

医学研究中的数据缺失是一个长期以来统计学家和医学研究人员都已认识到的常见问题。一般来说，如果不考虑数据缺失的影响，统计分析结果将会有偏差，并且数据中的变异性也无法得到正确估计。数据缺失模式主要有三种：完全随机缺失（MCAR）、随机缺失（MAR）和非随机缺失（NMAR）。研究人员数据集中的数据缺失类型决定了在正式统计分析开始前处理缺失数据应采用的合适方法。本实践指南的目的是描述这些数据缺失模式及其可能出现的方式，同时描述处理这些模式的方法。文中介绍了简单和更复杂的方法，包括每种方法的优缺点以及它们在常规软件中的可用性。采用不同的数据缺失技术进行敏感性分析，以评估每种方法得出结论的稳健性，这是一种良好的做法。