• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机缺失情况下缺失数据的多重填补:在填补模型中纳入一个对撞机作为辅助变量会导致偏差。

Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.

作者信息

Curnow Elinor, Tilling Kate, Heron Jon E, Cornish Rosie P, Carpenter James R

机构信息

Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom.

Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, United Kingdom.

出版信息

Front Epidemiol. 2023 Sep 15;3:1237447. doi: 10.3389/fepid.2023.1237447.

DOI:10.3389/fepid.2023.1237447
PMID:37974561
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7615309/
Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a "collider"), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

摘要

流行病学研究常常存在缺失数据,通常采用多重填补(MI)方法来处理。在多重填补中,除了实质性分析所需的变量外,填补模型通常还包括其他变量(“辅助变量”)。能够预测部分观测变量的辅助变量可以降低多重填补估计量的标准误差(SE),并且,如果它们还能预测数据缺失的概率,还可以减少因数据非随机缺失而导致的偏差。然而,目前缺乏关于选择辅助变量的指导。我们研究了选择不当的辅助变量所带来的后果:如果它与部分观测变量以及其自身缺失的概率有共同的原因(即它是一个“对撞机”),那么将其纳入可能会在多重填补估计量中引入偏差,并且可能会增加标准误差。我们通过代数方法和模拟,量化了暴露或结局不完全时偏差和标准误差的大小。当实质性分析的结局部分被观测到时,相对于暴露系数的大小,偏差可能会很大。在完整记录分析有效的情况下,当暴露部分被观测到时,偏差较小。然而,如果结局也导致暴露数据缺失,偏差可能会更大。在使用多重填补时,通过数据探索、考虑合理的因果图和缺失机制相结合的方式,检查潜在的辅助变量是否为对撞机是很重要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/89150a434dfa/fepid-03-1237447-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/c4cb295e2ace/fepid-03-1237447-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/83a17dcf4c69/fepid-03-1237447-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/180ac0ebed3c/fepid-03-1237447-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/c77ca0a2db37/fepid-03-1237447-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/4e8f5f320dd2/fepid-03-1237447-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/fce50ff1b929/fepid-03-1237447-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/09fff22c2d6c/fepid-03-1237447-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/89150a434dfa/fepid-03-1237447-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/c4cb295e2ace/fepid-03-1237447-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/83a17dcf4c69/fepid-03-1237447-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/180ac0ebed3c/fepid-03-1237447-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/c77ca0a2db37/fepid-03-1237447-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/4e8f5f320dd2/fepid-03-1237447-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/fce50ff1b929/fepid-03-1237447-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/09fff22c2d6c/fepid-03-1237447-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75f5/10910950/89150a434dfa/fepid-03-1237447-g008.jpg

相似文献

1
Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias.随机缺失情况下缺失数据的多重填补:在填补模型中纳入一个对撞机作为辅助变量会导致偏差。
Front Epidemiol. 2023 Sep 15;3:1237447. doi: 10.3389/fepid.2023.1237447.
2
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。
BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.
3
Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.使用多重填补法进行分析时需要考虑辅助变量中的缺失数据。
Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.
4
Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.处理缺失结局数据的“多次插补,然后删除”方法的偏倚和精密度
Am J Epidemiol. 2015 Sep 15;182(6):528-34. doi: 10.1093/aje/kwv100. Epub 2015 Sep 2.
5
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.使用关联替代结局数据的多重填补可显著减少偏差并提高效率:一项模拟研究。
Emerg Themes Epidemiol. 2017 Dec 19;14:14. doi: 10.1186/s12982-017-0068-0. eCollection 2017.
6
A Causal View on Bias in Missing Data Imputation: The Impact of Evil Auxiliary Variables on Norming of Test Scores.缺失数据插补中偏差的因果观点:不良辅助变量对测试分数归一化的影响。
Multivariate Behav Res. 2025 Mar-Apr;60(2):258-274. doi: 10.1080/00273171.2024.2412682. Epub 2024 Oct 20.
7
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.考虑由于非随机缺失结局数据导致的偏倚:两种概率性偏倚分析方法的比较和说明:一项模拟研究。
BMC Med Res Methodol. 2024 Nov 13;24(1):278. doi: 10.1186/s12874-024-02382-4.
8
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.在统计分析中处理缺失数据:多重插补并不总是答案。
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
9
The proportion of missing data should not be used to guide decisions on multiple imputation.缺失数据的比例不应用于指导多重插补的决策。
J Clin Epidemiol. 2019 Jun;110:63-73. doi: 10.1016/j.jclinepi.2019.02.016. Epub 2019 Mar 13.
10
Correction of bias from non-random missing longitudinal data using auxiliary information.利用辅助信息纠正非随机缺失纵向数据的偏差。
Stat Med. 2010 Mar 15;29(6):671-9. doi: 10.1002/sim.3821.

引用本文的文献

1
Pitfalls of imputing using incomplete auxiliary variables.使用不完整辅助变量进行插补的陷阱。
Am J Epidemiol. 2025 Jun 3;194(6):1801-1802. doi: 10.1093/aje/kwaf043.
2
How much missing data is too much to impute for longitudinal health indicators? A preliminary guideline for the choice of the extent of missing proportion to impute with multiple imputation by chained equations.对于纵向健康指标而言,多少缺失数据量过多而无法进行插补?关于选择使用链式方程多重插补法进行插补的缺失比例范围的初步指南。
Popul Health Metr. 2025 Feb 1;23(1):2. doi: 10.1186/s12963-025-00364-2.
3
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

本文引用的文献

1
Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。
Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.
2
Exploring the causal effect of maternal pregnancy adiposity on offspring adiposity: Mendelian randomisation using polygenic risk scores.探讨母体妊娠肥胖对后代肥胖的因果效应:基于多基因风险评分的孟德尔随机化研究。
BMC Med. 2022 Feb 1;20(1):34. doi: 10.1186/s12916-021-02216-w.
3
Missing data: A statistical framework for practice.
仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。
BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.
4
Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.使用多重填补法进行分析时需要考虑辅助变量中的缺失数据。
Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.
缺失数据:一种实践的统计框架。
Biom J. 2021 Jun;63(5):915-947. doi: 10.1002/bimj.202000196. Epub 2021 Feb 24.
4
Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework.观察性研究中缺失数据的处理和报告框架:观察性研究中缺失数据的处理和报告框架。
J Clin Epidemiol. 2021 Jun;134:79-88. doi: 10.1016/j.jclinepi.2021.01.008. Epub 2021 Feb 2.
5
Factors associated with participation over time in the Avon Longitudinal Study of Parents and Children: a study using linked education and primary care data.与参与阿冯纵向研究的父母和孩子的时间有关的因素:一项使用链接教育和初级保健数据的研究。
Int J Epidemiol. 2021 Mar 3;50(1):293-302. doi: 10.1093/ije/dyaa192.
6
Accounting for missing data in statistical analyses: multiple imputation is not always the answer.在统计分析中处理缺失数据:多重插补并不总是答案。
Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. doi: 10.1093/ije/dyz032.
7
Multiple imputation using linked proxy outcome data resulted in important bias reduction and efficiency gains: a simulation study.使用关联替代结局数据的多重填补可显著减少偏差并提高效率:一项模拟研究。
Emerg Themes Epidemiol. 2017 Dec 19;14:14. doi: 10.1186/s12982-017-0068-0. eCollection 2017.
8
Programming of Adiposity in Childhood and Adolescence: Associations With Birth Weight and Cord Blood Adipokines.儿童及青少年肥胖的编程:与出生体重和脐带血脂肪因子的关联
J Clin Endocrinol Metab. 2017 Feb 1;102(2):499-506. doi: 10.1210/jc.2016-2342.
9
A Cautious Note on Auxiliary Variables That Can Increase Bias in Missing Data Problems.关于可能增加缺失数据问题偏差的辅助变量的谨慎说明。
Multivariate Behav Res. 2014 Sep-Oct;49(5):443-59. doi: 10.1080/00273171.2014.931799.
10
Tuning multiple imputation by predictive mean matching and local residual draws.通过预测均值匹配和局部残差抽样调整多重填补法。
BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.