随机缺失情况下缺失数据的多重填补：在填补模型中纳入一个对撞机作为辅助变量会导致偏差。

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.

作者信息

Curnow Elinor, Tilling Kate, Heron Jon E, Cornish Rosie P, Carpenter James R

机构信息

Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom.

Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, United Kingdom.

出版信息

Front Epidemiol. 2023 Sep 15;3:1237447. doi: 10.3389/fepid.2023.1237447.

DOI:10.3389/fepid.2023.1237447

PMID:37974561

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7615309/

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a "collider"), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

摘要

流行病学研究常常存在缺失数据，通常采用多重填补（MI）方法来处理。在多重填补中，除了实质性分析所需的变量外，填补模型通常还包括其他变量（“辅助变量”）。能够预测部分观测变量的辅助变量可以降低多重填补估计量的标准误差（SE），并且，如果它们还能预测数据缺失的概率，还可以减少因数据非随机缺失而导致的偏差。然而，目前缺乏关于选择辅助变量的指导。我们研究了选择不当的辅助变量所带来的后果：如果它与部分观测变量以及其自身缺失的概率有共同的原因（即它是一个“对撞机”），那么将其纳入可能会在多重填补估计量中引入偏差，并且可能会增加标准误差。我们通过代数方法和模拟，量化了暴露或结局不完全时偏差和标准误差的大小。当实质性分析的结局部分被观测到时，相对于暴露系数的大小，偏差可能会很大。在完整记录分析有效的情况下，当暴露部分被观测到时，偏差较小。然而，如果结局也导致暴露数据缺失，偏差可能会更大。在使用多重填补时，通过数据探索、考虑合理的因果图和缺失机制相结合的方式，检查潜在的辅助变量是否为对撞机是很重要的。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

随机缺失情况下缺失数据的多重填补：在填补模型中纳入一个对撞机作为辅助变量会导致偏差。

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

随机缺失情况下缺失数据的多重填补：在填补模型中纳入一个对撞机作为辅助变量会导致偏差。

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献