对流行病学回归分析中处理缺失协变量方法的批判性审视。

A critical look at methods for handling missing covariates in epidemiologic regression analyses.

作者信息

Greenland S, Finkle W D

机构信息

Department of Epidemiology, UCLA School of Public Health, 90095-1772, USA.

出版信息

Am J Epidemiol. 1995 Dec 15;142(12):1255-64. doi: 10.1093/oxfordjournals.aje.a117592.

DOI:10.1093/oxfordjournals.aje.a117592

PMID:7503045

Abstract

Epidemiologic studies often encounter missing covariate values. While simple methods such as stratification on missing-data status, conditional-mean imputation, and complete-subject analysis are commonly employed for handling this problem, several studies have shown that these methods can be biased under reasonable circumstances. The authors review these results in the context of logistic regression and present simulation experiments showing the limitations of the methods. The method based on missing-data indicators can exhibit severe bias even when the data are missing completely at random, and regression (conditional-mean) imputation can be inordinately sensitive to model misspecification. Even complete-subject analysis can outperform these methods. More sophisticated methods, such as maximum likelihood, multiple imputation, and weighted estimating equations, have been given extensive attention in the statistics literature. While these methods are superior to simple methods, they are not commonly used in epidemiology, no doubt due to their complexity and the lack of packaged software to apply these methods. The authors contrast the results of multiple imputation to simple methods in the analysis of a case-control study of endometrial cancer, and they find a meaningful difference in results for age at menarche. In general, the authors recommend that epidemiologists avoid using the missing-indicator method and use more sophisticated methods whenever a large proportion of data are missing.

摘要

流行病学研究常常会遇到协变量值缺失的情况。虽然诸如根据缺失数据状态进行分层、条件均值插补和完全病例分析等简单方法通常用于处理这个问题，但一些研究表明，在合理的情况下这些方法可能存在偏差。作者在逻辑回归的背景下回顾了这些结果，并展示了模拟实验以说明这些方法的局限性。基于缺失数据指标的方法即使在数据完全随机缺失时也可能表现出严重偏差，而回归（条件均值）插补可能对模型设定错误过度敏感。甚至完全病例分析都可能比这些方法表现更好。更复杂的方法，如最大似然法、多重插补法和加权估计方程法，在统计学文献中受到了广泛关注。虽然这些方法优于简单方法，但它们在流行病学中并不常用，这无疑是由于其复杂性以及缺乏应用这些方法的打包软件。作者在一项子宫内膜癌病例对照研究的分析中将多重插补的结果与简单方法进行了对比，他们发现初潮年龄的结果存在显著差异。总体而言，作者建议流行病学家避免使用缺失指标法，并且在大量数据缺失时使用更复杂的方法。

相似文献

A critical look at methods for handling missing covariates in epidemiologic regression analyses.

Am J Epidemiol. 1995 Dec 15;142(12):1255-64. doi: 10.1093/oxfordjournals.aje.a117592.

Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example.

J Clin Epidemiol. 2006 Oct;59(10):1102-9. doi: 10.1016/j.jclinepi.2006.01.015. Epub 2006 Jul 11.

Multiple imputation for handling missing outcome data when estimating the relative risk.

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Comparison of the missing-indicator method and conditional logistic regression in 1:m matched case-control studies with missing exposure values.

Am J Epidemiol. 2004 Mar 15;159(6):603-10. doi: 10.1093/aje/kwh075.

A comparison of different methods to handle missing data in the context of propensity score analysis.

Eur J Epidemiol. 2019 Jan;34(1):23-36. doi: 10.1007/s10654-018-0447-z. Epub 2018 Oct 19.

Cox regression analysis with missing covariates via nonparametric multiple imputation.

Stat Methods Med Res. 2019 Jun;28(6):1676-1688. doi: 10.1177/0962280218772592. Epub 2018 May 2.

Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

Int J Biostat. 2017 Apr 20;13(1):/j/ijb.2017.13.issue-1/ijb-2016-0053/ijb-2016-0053.xml. doi: 10.1515/ijb-2016-0053.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Review: a gentle introduction to imputation of missing values.

J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.

Missing covariate data in medical research: to impute is better than to ignore.

J Clin Epidemiol. 2010 Jul;63(7):721-7. doi: 10.1016/j.jclinepi.2009.12.008. Epub 2010 Mar 24.

引用本文的文献

Development and Preliminary Validation of the Sexual Minority Adolescent Stress Inventory - Short Form (SMASI-SF).

Psychol Sex Orientat Gend Divers. 2024 Mar 21. doi: 10.1037/sgd0000706.

The impact of early special educational needs provision on later hospital admissions, school absence and education attainment: A target trial emulation study of children with isolated cleft lip and/or palate.

PLoS One. 2025 Jul 16;20(7):e0327720. doi: 10.1371/journal.pone.0327720. eCollection 2025.

Associations Between Mothers' COVID-Related Perceived Stress and Children's Internalizing and Externalizing Symptoms in Peru.

Child Psychiatry Hum Dev. 2025 Jun 27. doi: 10.1007/s10578-025-01872-w.

Synergistic effects of cardiovascular health and social isolation on adverse pregnancy outcomes.

Sci Rep. 2025 May 29;15(1):18924. doi: 10.1038/s41598-025-03652-x.

All-cancer incidence and mortality in Pakistanis, Bangladeshis, and their descendants in England and Wales.

BMC Public Health. 2024 Dec 2;24(1):3352. doi: 10.1186/s12889-024-20813-1.

Adjusting for incomplete baseline covariates in randomized controlled trials: a cross-world imputation framework.

Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae094.

Associations of ambient air pollution and lifestyle with the risk of NAFLD: a population-based cohort study.

BMC Public Health. 2024 Aug 29;24(1):2354. doi: 10.1186/s12889-024-19761-7.

Invited commentary: mixing multiple imputation and bootstrapping for variance estimation.

Am J Epidemiol. 2024 Oct 7;193(10):1477-1481. doi: 10.1093/aje/kwae065.

"Dare to feel full"-A group treatment method for sustainable weight reduction in overweight and obese adults: A randomized controlled trial with 5-years follow-up.

PLoS One. 2024 May 9;19(5):e0303021. doi: 10.1371/journal.pone.0303021. eCollection 2024.

Accessing and utilizing clinical and genomic data from an electronic health record data warehouse.

Transl Med Commun. 2023;8. doi: 10.1186/s41231-023-00140-0. Epub 2023 Mar 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对流行病学回归分析中处理缺失协变量方法的批判性审视。

A critical look at methods for handling missing covariates in epidemiologic regression analyses.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献