• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用结果来插补缺失的预测变量值是更可取的。

Using the outcome for imputation of missing predictor values was preferred.

作者信息

Moons Karel G M, Donders Rogier A R T, Stijnen Theo, Harrell Frank E

机构信息

Julius Center for Health Sciences and General Practice, University Medical Center, Utrecht, P.O. Box 80035, 3508 GA Utrecht, The Netherlands.

出版信息

J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.

DOI:10.1016/j.jclinepi.2006.01.009
PMID:16980150
Abstract

BACKGROUND AND OBJECTIVE

Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy.

METHODS

We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth.

RESULTS

Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR.

CONCLUSION

For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.

摘要

背景与目的

流行病学研究通常估计预测因素(风险因素)与结果之间的关联。大多数软件会自动排除存在缺失值的受试者。这通常会导致偏差,因为缺失值很少完全随机出现(完全随机缺失,MCAR),而是基于其他(观察到的)变量选择性地出现,即随机缺失(MAR)。提倡使用包括结果在内的所有观察信息对缺失的预测值进行多重填补(MI),以处理选择性缺失值。这似乎是一个自我实现的预言。

方法

我们使用一项关于肺栓塞诊断研究的数据来检验这一假设。我们选择了五个无缺失值的肺栓塞预测因素。从原始样本估计的它们的回归系数和标准误(SEs)被视为“真实”值。我们给这些预测因素赋予缺失值——包括完全随机缺失和随机缺失——并使用模拟重复此过程1000次。每次模拟中,我们在不包括结果和包括结果的情况下对缺失值进行多重填补,并将回归系数和标准误与真实值进行比较。

结果

基于包括结果的多重填补得到的回归系数接近真实值。不包括结果的多重填补产生的系数偏差很大——被低估。包括结果和不包括结果的多重填补在标准误和90%置信区间覆盖范围方面没有差异。完全随机缺失和随机缺失的结果相同。

结论

对于所有类型的缺失值,使用结果对缺失的预测值进行填补优于不使用结果的填补,且不是自我实现的预言。

相似文献

1
Using the outcome for imputation of missing predictor values was preferred.使用结果来插补缺失的预测变量值是更可取的。
J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.
2
Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example.在多变量诊断研究中,缺失值插补优于完全病例分析和缺失指标法:一个临床实例。
J Clin Epidemiol. 2006 Oct;59(10):1102-9. doi: 10.1016/j.jclinepi.2006.01.015. Epub 2006 Jul 11.
3
Dealing with missing data in a multi-question depression scale: a comparison of imputation methods.处理多问题抑郁量表中的缺失数据:插补方法比较
BMC Med Res Methodol. 2006 Dec 13;6:57. doi: 10.1186/1471-2288-6-57.
4
Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques.流行病学研究中心抑郁量表的缺失数据:4种插补技术的比较
Res Social Adm Pharm. 2007 Mar;3(1):1-27. doi: 10.1016/j.sapharm.2006.04.001.
5
Multiple imputation for missing income data in population-based health surveillance.基于人群的健康监测中缺失收入数据的多重插补。
J Public Health Manag Pract. 2009 Nov-Dec;15(6):E12-21. doi: 10.1097/PHH.0b013e3181aab5f7.
6
Missing data and imputation: a practical illustration in a prognostic study on low back pain.缺失数据与插补:腰痛预后研究中的实际例证
J Manipulative Physiol Ther. 2012 Jul;35(6):464-71. doi: 10.1016/j.jmpt.2012.07.002.
7
Modeling major lung resection outcomes using classification trees and multiple imputation techniques.使用分类树和多重填补技术对肺大部切除术结果进行建模。
Eur J Cardiothorac Surg. 2008 Nov;34(5):1085-9. doi: 10.1016/j.ejcts.2008.07.037. Epub 2008 Aug 29.
8
[Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method].[随机缺失数据的多重填补:一般要点及一种蒙特卡罗方法的介绍]
Rev Epidemiol Sante Publique. 2009 Oct;57(5):361-72. doi: 10.1016/j.respe.2009.04.011. Epub 2009 Aug 11.
9
Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example.缺失指示符法或完全案例分析用于缺失混杂因素值时的不可预测偏差:实证示例。
J Clin Epidemiol. 2010 Jul;63(7):728-36. doi: 10.1016/j.jclinepi.2009.08.028. Epub 2010 Mar 25.
10
Multiple imputation in veterinary epidemiological studies: a case study and simulation.兽医流行病学研究中的多重填补:一个案例研究与模拟
Prev Vet Med. 2016 Jul 1;129:35-47. doi: 10.1016/j.prevetmed.2016.04.003. Epub 2016 May 13.

引用本文的文献

1
APOE Genotype and Statin Response: Evidence From the UK Biobank and All of Us Program.APOE基因分型与他汀类药物反应:来自英国生物银行和“我们所有人”计划的证据。
Clin Transl Sci. 2025 Aug;18(8):e70314. doi: 10.1111/cts.70314.
2
The association between cannabis use and adherence to COVID-19 public health guidelines: prospective analyses from the French CONSTANCES cohort and SAPRIS survey.大麻使用与遵守新冠疫情公共卫生指南之间的关联:来自法国CONSTANCES队列和SAPRIS调查的前瞻性分析
Harm Reduct J. 2025 Jul 18;22(1):121. doi: 10.1186/s12954-025-01278-w.
3
Development, validation, and updating of prognostic models for m7G-associated genes from TAMs in lower-grade gliomas.
低级别胶质瘤中肿瘤相关巨噬细胞(TAM)的m7G相关基因预后模型的开发、验证及更新
Sci Rep. 2025 Jul 11;15(1):25146. doi: 10.1038/s41598-025-10275-9.
4
Perinatal risk assessment in pregnancies complicated by early-onset fetal growth restriction: development and internal validation of a prediction model for composite adverse perinatal outcome.早发型胎儿生长受限合并妊娠的围产期风险评估:复合不良围产期结局预测模型的开发与内部验证
Ultrasound Obstet Gynecol. 2025 Aug;66(2):175-185. doi: 10.1002/uog.29265. Epub 2025 Jul 7.
5
Early predictors of late childhood behavioural outcomes following very preterm birth.极早产儿童晚期行为结果的早期预测因素。
Psychol Med. 2025 Jul 7;55:e189. doi: 10.1017/S0033291725001151.
6
Liver-related outcomes in patients with cirrhosis: The value of clinical and laboratory data and noninvasive tests.肝硬化患者的肝脏相关结局:临床和实验室数据以及非侵入性检查的价值
PLoS One. 2025 Jul 1;20(7):e0326702. doi: 10.1371/journal.pone.0326702. eCollection 2025.
7
Community health workers identify children requiring health center admission in Northern Uganda: prehospital risk prediction using vital signs and advanced point-of-care tests.社区卫生工作者识别乌干达北部需要入住健康中心的儿童:使用生命体征和先进即时检验进行院前风险预测
Glob Health Action. 2025 Dec;18(1):2519704. doi: 10.1080/16549716.2025.2519704. Epub 2025 Jun 26.
8
Clinical, genetic, and sociodemographic predictors of symptom severity after internet-delivered cognitive behavioural therapy for depression and anxiety.网络认知行为疗法治疗抑郁和焦虑后症状严重程度的临床、遗传和社会人口学预测因素
BMC Psychiatry. 2025 May 30;25(1):555. doi: 10.1186/s12888-025-07012-x.
9
A Multiple Imputation Workflow for Handling Missing Covariate Data in Pharmacometrics Modeling.一种用于处理药代动力学建模中缺失协变量数据的多重填补工作流程。
CPT Pharmacometrics Syst Pharmacol. 2025 Jun;14(6):991-1005. doi: 10.1002/psp4.70039. Epub 2025 May 29.
10
Comparison of methods to handle missing values in a continuous index test in a diagnostic accuracy study - a simulation study.诊断准确性研究中连续指标试验中处理缺失值方法的比较——一项模拟研究
BMC Med Res Methodol. 2025 May 27;25(1):147. doi: 10.1186/s12874-025-02594-2.