Pedersen Alma B, Mikkelsen Ellen M, Cronin-Fenton Deirdre, Kristensen Nickolaj R, Pham Tra My, Pedersen Lars, Petersen Irene
Department of Clinical Epidemiology, Aarhus University Hospital, Aarhus N, Denmark.
Department of Primary Care and Population Health, University College London, London, UK.
Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.
Missing data are ubiquitous in clinical epidemiological research. Individuals with missing data may differ from those with no missing data in terms of the outcome of interest and prognosis in general. Missing data are often categorized into the following three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In clinical epidemiological research, missing data are seldom MCAR. Missing data can constitute considerable challenges in the analyses and interpretation of results and can potentially weaken the validity of results and conclusions. A number of methods have been developed for dealing with missing data. These include complete-case analyses, missing indicator method, single value imputation, and sensitivity analyses incorporating worst-case and best-case scenarios. If applied under the MCAR assumption, some of these methods can provide unbiased but often less precise estimates. Multiple imputation is an alternative method to deal with missing data, which accounts for the uncertainty associated with missing data. Multiple imputation is implemented in most statistical software under the MAR assumption and provides unbiased and valid estimates of associations based on information from the available data. The method affects not only the coefficient estimates for variables with missing data but also the estimates for other variables with no missing data.
缺失数据在临床流行病学研究中普遍存在。一般来说,有缺失数据的个体在感兴趣的结局和预后方面可能与无缺失数据的个体不同。缺失数据通常分为以下三种类型:完全随机缺失(MCAR)、随机缺失(MAR)和非随机缺失(MNAR)。在临床流行病学研究中,缺失数据很少是MCAR。缺失数据可能会在结果分析和解释方面带来相当大的挑战,并可能削弱结果和结论的有效性。已经开发了许多方法来处理缺失数据。这些方法包括完全病例分析、缺失指标法、单值插补以及纳入最坏情况和最佳情况的敏感性分析。如果在MCAR假设下应用,其中一些方法可以提供无偏估计,但往往不太精确。多重插补是处理缺失数据的另一种方法,它考虑了与缺失数据相关的不确定性。多重插补在大多数统计软件中在MAR假设下实施,并基于可用数据中的信息提供无偏且有效的关联估计。该方法不仅会影响有缺失数据变量的系数估计,还会影响其他无缺失数据变量的估计。