在比例风险回归模型中使用完全病例分析时因暴露数据缺失导致的偏倚。

Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model.

作者信息

Demissie Serkalem, LaValley Michael P, Horton Nicholas J, Glynn Robert J, Cupples L Adrienne

机构信息

Department of Biostatistics, Boston University School of Public Health, 715 Albany Street T4E, Boston, MA 02118-2526, U.S.A.

出版信息

Stat Med. 2003 Feb 28;22(4):545-57. doi: 10.1002/sim.1340.

DOI:10.1002/sim.1340

PMID:12590413

Abstract

We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.

摘要

我们研究了在使用完全病例分析（CCA）时，比例风险回归模型中因暴露数据缺失而导致的偏差。考虑了11种缺失数据情况：一种是完全随机缺失（MCAR），四种是随机缺失（MAR），以及六种不可忽略的缺失情况，涵盖了各种风险比、删失比例、缺失比例和样本量。当缺失为MCAR或仅依赖于暴露时，偏差可忽略不计（2%-3%），这与无缺失数据的完整数据集中的估计值与真实参数之间的差异相似。相比之下，当缺失依赖于结局或同时依赖于结局和暴露时，会出现较大偏差。对于风险比为3.5、样本量为400、删失比例为20%且缺失数据比例为40%的模型，风险比的相对偏差在7%至64%之间。我们观察到在各种缺失数据机制下，偏差的方向和大小存在重要差异。例如，在缺失与较长或较短随访相关的情况下，偏差显著不同，尽管这两种机制都是MAR。当缺失与较长随访相关时，风险比被低估（偏差较大），而当与较短随访相关时，风险比被高估（偏差较小）。如果已知缺失与较少观察到的结局或与结局和暴露都相关，CCA可能会导致无效推断，应考虑其他处理缺失数据的方法。

相似文献

Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model.

Stat Med. 2003 Feb 28;22(4):545-57. doi: 10.1002/sim.1340.

Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example.

J Clin Epidemiol. 2010 Jul;63(7):728-36. doi: 10.1016/j.jclinepi.2009.08.028. Epub 2010 Mar 25.

Identifiability assumptions for missing covariate data in failure time regression models.

Biostatistics. 2007 Apr;8(2):345-56. doi: 10.1093/biostatistics/kxl014. Epub 2006 Jul 13.

The performance of multiple imputation for missing covariate data within the context of regression relative survival analysis.

Stat Med. 2008 Dec 30;27(30):6310-31. doi: 10.1002/sim.3476.

Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands: the NLCS-AIR study.

Res Rep Health Eff Inst. 2009 Mar(139):5-71; discussion 73-89.

Assessing missing data assumptions in longitudinal studies: an example using a smoking cessation trial.

Drug Alcohol Depend. 2005 Mar 7;77(3):213-25. doi: 10.1016/j.drugalcdep.2004.08.018.

Multiple imputation of missing genotype data for unrelated individuals.

Ann Hum Genet. 2006 May;70(Pt 3):372-81. doi: 10.1111/j.1529-8817.2005.00236.x.

Out of sight, not out of mind: strategies for handling missing data.

Am J Health Behav. 2008 Jan-Feb;32(1):83-92. doi: 10.5555/ajhb.2008.32.1.83.

Principal stratification designs to estimate input data missing due to death.

Biometrics. 2007 Sep;63(3):641-9; discussion 650-62. doi: 10.1111/j.1541-0420.2007.00847_1.x.

Using the outcome for imputation of missing predictor values was preferred.

J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.

引用本文的文献

Data-driven risk analysis of nonlinear factor interactions in road safety using Bayesian networks.

Sci Rep. 2024 Aug 15;14(1):18948. doi: 10.1038/s41598-024-69740-6.

Machine Learning Techniques for Developing Remotely Monitored Central Nervous System Biomarkers Using Wearable Sensors: A Narrative Literature Review.

Sensors (Basel). 2023 May 31;23(11):5243. doi: 10.3390/s23115243.

Methods for handling missing data in serially sampled sputum specimens for mycobacterial culture conversion calculation.

BMC Med Res Methodol. 2022 Nov 19;22(1):297. doi: 10.1186/s12874-022-01782-8.

Econometric Issues in Prospective Economic Evaluations Alongside Clinical Trials: Combining the Nonparametric Bootstrap With Methods That Address Missing Data.

Epidemiol Rev. 2022 Dec 21;44(1):67-77. doi: 10.1093/epirev/mxac006.

Hybrid modelling for stroke care: Review and suggestions of new approaches for risk assessment and simulation of scenarios.

Neuroimage Clin. 2021;31:102694. doi: 10.1016/j.nicl.2021.102694. Epub 2021 May 7.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

Utility of inverse probability weighting in molecular pathological epidemiology.

Eur J Epidemiol. 2018 Apr;33(4):381-392. doi: 10.1007/s10654-017-0346-8. Epub 2017 Dec 20.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Short-term air pollution exposure aggravates Parkinson's disease in a population-based cohort.

Sci Rep. 2017 Mar 16;7:44741. doi: 10.1038/srep44741.

Estimation of indirect effect when the mediator is a censored variable.

Stat Methods Med Res. 2018 Oct;27(10):3010-3025. doi: 10.1177/0962280217690414. Epub 2017 Jan 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在比例风险回归模型中使用完全病例分析时因暴露数据缺失导致的偏倚。

Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model.

作者信息

Demissie Serkalem, LaValley Michael P, Horton Nicholas J, Glynn Robert J, Cupples L Adrienne

机构信息

Department of Biostatistics, Boston University School of Public Health, 715 Albany Street T4E, Boston, MA 02118-2526, U.S.A.

出版信息

Stat Med. 2003 Feb 28;22(4):545-57. doi: 10.1002/sim.1340.

DOI:10.1002/sim.1340

PMID:12590413

Abstract

摘要

在比例风险回归模型中使用完全病例分析时因暴露数据缺失导致的偏倚。

Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

在比例风险回归模型中使用完全病例分析时因暴露数据缺失导致的偏倚。

Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model.

作者信息

机构信息

出版信息

相似文献

引用本文的文献