• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。

Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

机构信息

Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.

Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK.

出版信息

BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.

DOI:10.1186/s12874-024-02353-9
PMID:39375597
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11457445/
Abstract

BACKGROUND

Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables ("auxiliary variables") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR.

METHODS

We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, "additional bias" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study.

RESULTS

The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed).

CONCLUSIONS

When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).

摘要

背景

流行病学和临床研究经常存在缺失数据,通常使用多重插补(MI)进行分析。一般来说,如果数据缺失不是随机的(MNAR),则 MI 估计值会存在偏差。通过在插补模型中除了包含实质性分析所需的变量之外,还包含其他变量(“辅助变量”),可以减少由于数据 MNAR 引起的偏差。通常的建议是采取包容性的辅助变量选择方法(即,包含所有被认为可预测缺失和/或缺失值的变量)。当数据可能 MNAR 时,关于这种策略的影响还没有明确的指导方针。

方法

我们探讨了当数据 MNAR 时,包含一个可预测缺失但实际上与部分观察变量无关的辅助变量对暴露系数 MI 估计值的额外偏差的影响。我们通过代数和模拟的方式量化了当(连续或二值)部分观察变量是分析结果或暴露时,MI 估计器的额外偏差的大小,拟合线性或逻辑回归模型。在这里,“额外偏差”是指当插补模型包含(i)辅助变量和其他分析模型变量时,MI 估计器的幅度差异;(ii)仅其他分析模型变量,请注意,由于数据 MNAR,两者都会存在偏差。我们通过重新分析出生队列研究的数据来说明这种额外偏差的程度。

结果

当结局部分观察且缺失由结局本身引起时,额外偏差可能相对较大,如果缺失由结局和暴露共同引起(当结局或暴露部分观察时),则额外偏差更大。

结论

在使用 MI 时,应避免使用包括所有可用辅助变量的简单且常用的策略。我们建议将最能预测部分观察变量的变量作为辅助变量包含在内,这些变量可以通过考虑合理的因果图和缺失机制以及数据探索来确定(请注意,由于选择偏差,与完整记录中部分观察变量的关联可能会失真)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/0b66fbc78edf/12874_2024_2353_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/46dea6ca2545/12874_2024_2353_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/257f0f0ab413/12874_2024_2353_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/6b2ac8e2f58e/12874_2024_2353_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/84b984a80a30/12874_2024_2353_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/7ed5e65e49c7/12874_2024_2353_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/8e7114c3fa0c/12874_2024_2353_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/2d3b09983e1e/12874_2024_2353_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/0b66fbc78edf/12874_2024_2353_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/46dea6ca2545/12874_2024_2353_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/257f0f0ab413/12874_2024_2353_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/6b2ac8e2f58e/12874_2024_2353_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/84b984a80a30/12874_2024_2353_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/7ed5e65e49c7/12874_2024_2353_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/8e7114c3fa0c/12874_2024_2353_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/2d3b09983e1e/12874_2024_2353_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72dd/11457445/0b66fbc78edf/12874_2024_2353_Fig8_HTML.jpg

相似文献

1
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。
BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.
2
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.随机缺失情况下缺失数据的多重填补:在填补模型中纳入一个对撞机作为辅助变量会导致偏差。
Front Epidemiol. 2023 Sep 15;3:1237447. doi: 10.3389/fepid.2023.1237447.
3
Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.使用多重填补法进行分析时需要考虑辅助变量中的缺失数据。
Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.
4
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.使用关联替代结局数据的多重填补可显著减少偏差并提高效率:一项模拟研究。
Emerg Themes Epidemiol. 2017 Dec 19;14:14. doi: 10.1186/s12982-017-0068-0. eCollection 2017.
5
Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.处理缺失结局数据的“多次插补,然后删除”方法的偏倚和精密度
Am J Epidemiol. 2015 Sep 15;182(6):528-34. doi: 10.1093/aje/kwv100. Epub 2015 Sep 2.
6
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较:一项模拟研究。
BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.
7
Complete case logistic regression with a dichotomised continuous outcome led to biased estimates.完全病例逻辑回归分析带有二分类连续结局会导致有偏估计。
J Clin Epidemiol. 2023 Feb;154:33-41. doi: 10.1016/j.jclinepi.2022.11.022. Epub 2022 Dec 1.
8
Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.处理危重症患者前瞻性临床研究中缺失的谵妄评估:一项模拟研究和两项谵妄研究的重新分析。
BMC Med Res Methodol. 2021 May 6;21(1):97. doi: 10.1186/s12874-021-01274-1.
9
A Bayesian Latent Variable Selection Model for Nonignorable Missingness.贝叶斯潜在变量选择模型在不可忽略缺失数据中的应用
Multivariate Behav Res. 2022 Mar-May;57(2-3):478-512. doi: 10.1080/00273171.2021.1874259. Epub 2021 Feb 2.
10
Multiple imputation with missing data indicators.带有缺失数据指标的多重插补。
Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

引用本文的文献

1
How much missing data is too much to impute for longitudinal health indicators? A preliminary guideline for the choice of the extent of missing proportion to impute with multiple imputation by chained equations.对于纵向健康指标而言,多少缺失数据量过多而无法进行插补?关于选择使用链式方程多重插补法进行插补的缺失比例范围的初步指南。
Popul Health Metr. 2025 Feb 1;23(1):2. doi: 10.1186/s12963-025-00364-2.
2
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚:两种概率性偏倚分析方法的比较和说明:一项模拟研究。
BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.
3

本文引用的文献

1
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.随机缺失情况下缺失数据的多重填补:在填补模型中纳入一个对撞机作为辅助变量会导致偏差。
Front Epidemiol. 2023 Sep 15;3:1237447. doi: 10.3389/fepid.2023.1237447.
2
Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified.在随机缺失下对缺失数据进行多重插补:如果插补模型指定错误,即使相容的插补模型也不足以避免偏差。
J Clin Epidemiol. 2023 Aug;160:100-109. doi: 10.1016/j.jclinepi.2023.06.011. Epub 2023 Jun 19.
3
Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.
使用多重填补法进行分析时需要考虑辅助变量中的缺失数据。
Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.
Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.
多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。
Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.
4
Missing data: A statistical framework for practice.缺失数据:一种实践的统计框架。
Biom J. 2021 Jun;63(5):915-947. doi: 10.1002/bimj.202000196. Epub 2021 Feb 24.
5
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.在统计分析中处理缺失数据:多重插补并不总是答案。
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
6
The Mechanics of Omitted Variable Bias: Bias Amplification and Cancellation of Offsetting Biases.遗漏变量偏差的机制:偏差放大与抵消偏差的消除
J Causal Inference. 2016 Sep;4(2). doi: 10.1515/jci-2016-0009. Epub 2016 Nov 8.
7
On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice.关于在实践中使用非随机完全条件规范(NARFCS)程序。
Stat Med. 2018 Jul 10;37(15):2338-2353. doi: 10.1002/sim.7643. Epub 2018 Apr 2.
8
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.使用关联替代结局数据的多重填补可显著减少偏差并提高效率:一项模拟研究。
Emerg Themes Epidemiol. 2017 Dec 19;14:14. doi: 10.1186/s12982-017-0068-0. eCollection 2017.
9
Instrumental variables as bias amplifiers with general outcome and confounding.作为具有一般结果和混杂因素的偏差放大器的工具变量。
Biometrika. 2017 Jun 1;104(2):291-302. doi: 10.1093/biomet/asx009. Epub 2017 Apr 17.
10
Missing data and multiple imputation in clinical epidemiological research.临床流行病学研究中的缺失数据与多重填补
Clin Epidemiol. 2017 Mar 15;9:157-166. doi: 10.2147/CLEP.S129785. eCollection 2017.