缺失数据下多重插补使用和报告的差距：针对因果问题的观察性研究的范围综述结果。

Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.

机构信息

Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia.

Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia.

出版信息

BMC Med Res Methodol. 2024 Sep 4;24(1):193. doi: 10.1186/s12874-024-02302-6.

DOI:10.1186/s12874-024-02302-6

PMID:39232661

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11373423/

Abstract

BACKGROUND

Missing data are common in observational studies and often occur in several of the variables required when estimating a causal effect, i.e. the exposure, outcome and/or variables used to control for confounding. Analyses involving multiple incomplete variables are not as straightforward as analyses with a single incomplete variable. For example, in the context of multivariable missingness, the standard missing data assumptions ("missing completely at random", "missing at random" [MAR], "missing not at random") are difficult to interpret and assess. It is not clear how the complexities that arise due to multivariable missingness are being addressed in practice. The aim of this study was to review how missing data are managed and reported in observational studies that use multiple imputation (MI) for causal effect estimation, with a particular focus on missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation.

METHODS

We searched five top general epidemiology journals for observational studies that aimed to answer a causal research question and used MI, published between January 2019 and December 2021. Article screening and data extraction were performed systematically.

RESULTS

Of the 130 studies included in this review, 108 (83%) derived an analysis sample by excluding individuals with missing data in specific variables (e.g., outcome) and 114 (88%) had multivariable missingness within the analysis sample. Forty-four (34%) studies provided a statement about missing data assumptions, 35 of which stated the MAR assumption, but only 11/44 (25%) studies provided a justification for these assumptions. The number of imputations, MI method and MI software were generally well-reported (71%, 75% and 88% of studies, respectively), while aspects of the imputation model specification were not clear for more than half of the studies. A secondary analysis that used a different approach to handle the missing data was conducted in 69/130 (53%) studies. Of these 69 studies, 68 (99%) lacked a clear justification for the secondary analysis.

CONCLUSION

Effort is needed to clarify the rationale for and improve the reporting of MI for estimation of causal effects from observational data. We encourage greater transparency in making and reporting analytical decisions related to missing data.

摘要

背景

在观察性研究中，缺失数据很常见，并且通常发生在估计因果效应所需的多个变量中，即暴露、结局和/或用于控制混杂的变量。涉及多个不完整变量的分析不像单一不完整变量的分析那样简单。例如，在多变量缺失的情况下，标准缺失数据假设（“完全随机缺失”、“随机缺失”[MAR]、“非随机缺失”）难以解释和评估。目前尚不清楚在实践中如何处理由于多变量缺失而产生的复杂性。本研究旨在回顾使用多重插补（MI）进行因果效应估计的观察性研究中缺失数据的管理和报告情况，重点关注缺失数据摘要、缺失数据假设、主要和敏感性分析以及 MI 实施。

方法

我们在五本顶级综合流行病学期刊中搜索了 2019 年 1 月至 2021 年 12 月期间发表的旨在回答因果研究问题且使用 MI 的观察性研究。通过系统地进行文章筛选和数据提取来完成研究。

结果

在本综述中纳入的 130 项研究中，108 项（83%）通过排除特定变量（如结局）中存在缺失数据的个体来获得分析样本，114 项（88%）在分析样本中存在多变量缺失。44 项（34%）研究对缺失数据假设进行了说明，其中 35 项研究陈述了 MAR 假设，但只有 11/44（25%）项研究对这些假设进行了论证。插补次数、MI 方法和 MI 软件的报告通常都很好（分别有 71%、75%和 88%的研究），而对于超过一半的研究，插补模型的指定方面并不明确。69/130（53%）项研究进行了使用不同方法处理缺失数据的二次分析。在这 69 项研究中，68 项（99%）缺乏对二次分析的明确论证。