纵向队列中缺失数据对随时间变化暴露分析的影响：一项模拟研究。

The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study.

作者信息

Karahalios Amalia, Baglietto Laura, Lee Katherine J, English Dallas R, Carlin John B, Simpson Julie A

机构信息

Centre for Molecular, Environmental, Genetic, and Analytic Epidemiology, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia.

出版信息

Emerg Themes Epidemiol. 2013 Aug 19;10(1):6. doi: 10.1186/1742-7622-10-6.

DOI:10.1186/1742-7622-10-6

PMID:23947681

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3751092/

Abstract

BACKGROUND

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

METHODS

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

RESULTS

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

CONCLUSIONS

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.

摘要

背景

在具有重复随访波次的纵向队列研究中，缺失数据常常引发问题。该领域的研究主要集中于对结局重复测量中的缺失数据进行分析，通常会排除暴露数据缺失的参与者。我们开展了一项模拟研究，以比较在分析两次测量的腰围与结直肠癌风险（一个完全可观察的结局）之间的关联时，完整病例分析与多重填补（MI）处理缺失数据的效果。

方法

我们生成了1000个数据集，每个数据集包含41476名个体，这些个体具有第1波次和第2波次的腰围值以及结直肠癌事件和死亡时间，以模拟墨尔本协作队列研究的数据分布。使用三种缺失数据机制，将三种缺失数据比例（15%、30%和50%）施加于第2波次的腰围上：完全随机缺失（MCAR），以及一个现实的和一个更极端的协变量依赖随机缺失（MAR）情景。我们评估了缺失数据对两项流行病学分析的影响：1）第1波次和第2波次之间腰围变化与结直肠癌风险之间的关联，并对第1波次的腰围进行了调整；2）第2波次的腰围与结直肠癌风险之间的关联，未对第1波次的腰围进行调整。

结果

在所有缺失数据情景下，我们观察到完整病例分析或MI的偏差都非常小，区间估计的覆盖范围接近名义上的95%水平。当腰围作为一个强辅助变量纳入填补模型时，MI在精度上有所提高。

结论

这项基于纵向队列研究数据的模拟研究表明，当数据为MCAR或依赖协变量缺失时，对于感兴趣的暴露存在高达50%的缺失数据，与完整病例分析相比，进行MI几乎没有什么优势。如果在填补模型中纳入一个不在分析模型中的强辅助变量，MI将在精度上有所提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d68/3751092/0075a56daf4c/1742-7622-10-6-1.jpg

相似文献

The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study.纵向队列中缺失数据对随时间变化暴露分析的影响：一项模拟研究。

Emerg Themes Epidemiol. 2013 Aug 19;10(1):6. doi: 10.1186/1742-7622-10-6.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时，用于处理纵向数据中缺失值的多种多重填补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较：一项模拟研究。

BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.

Multiple imputation in veterinary epidemiological studies: a case study and simulation.兽医流行病学研究中的多重填补：一个案例研究与模拟

Prev Vet Med. 2016 Jul 1;129:35-47. doi: 10.1016/j.prevetmed.2016.04.003. Epub 2016 May 13.

Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.评价在二分类结局病例-对照研究中采用多种插补方法处理协变量缺失信息的效果。

BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4.

Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?在二元结局观察值缺失的情况下，对于估计随机对照试验中的患病率（风险）差异，使用多重填补法是否比完全病例分析法更好？

Trials. 2016 Jul 22;17:341. doi: 10.1186/s13063-016-1473-3.

How to deal with missing longitudinal data in cost of illness analysis in Alzheimer's disease-suggestions from the GERAS observational study.如何处理阿尔茨海默病疾病成本分析中的纵向数据缺失问题——来自GERAS观察性研究的建议

BMC Med Res Methodol. 2016 Jul 18;16:83. doi: 10.1186/s12874-016-0188-1.

Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry.从临床注册研究中估计患者报告结局变化时缺失数据对偏差和精度的影响。

Health Qual Life Outcomes. 2019 Jun 20;17(1):106. doi: 10.1186/s12955-019-1181-2.

Properties of the full random-effect modeling approach with missing covariate data.完全随机效应建模方法在缺失协变量数据下的性质。

Stat Med. 2024 Feb 28;43(5):935-952. doi: 10.1002/sim.9979. Epub 2023 Dec 21.

Analyses Using Multiple Imputation Need to Consider Missing Data in Auxiliary Variables.使用多重填补法进行分析时需要考虑辅助变量中的缺失数据。

Am J Epidemiol. 2024 Aug 27. doi: 10.1093/aje/kwae306.

引用本文的文献

Development of a long-term time-weighted exposure metric that accounts for missing data in the Seychelles Child Development Study.开发一种长期时间加权暴露度量方法，该方法考虑了塞舌尔儿童发展研究中的缺失数据。

Neurotoxicology. 2022 Sep;92:49-60. doi: 10.1016/j.neuro.2022.07.003. Epub 2022 Jul 19.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值：一项模拟研究。

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Breast Cancer and Modifiable Lifestyle Factors in Argentinean Women: Addressing Missing Data in a Case-Control Study.阿根廷女性的乳腺癌与可改变的生活方式因素：病例对照研究中缺失数据的处理

Asian Pac J Cancer Prev. 2016 Oct 1;17(10):4567-4575. doi: 10.22034/apjcp.2016.17.10.4567.

Quantification of the smoking-associated cancer risk with rate advancement periods: meta-analysis of individual participant data from cohorts of the CHANCES consortium.利用发病提前期对吸烟相关癌症风险进行量化：CHANCES联盟队列中个体参与者数据的荟萃分析。

BMC Med. 2016 Apr 5;14:62. doi: 10.1186/s12916-016-0607-5.

Change in weight and waist circumference and risk of colorectal cancer: results from the Melbourne Collaborative Cohort Study.体重和腰围变化与结直肠癌风险：墨尔本协作队列研究结果

BMC Cancer. 2016 Feb 25;16:157. doi: 10.1186/s12885-016-2144-1.

Using decision trees to understand structure in missing data.使用决策树来理解缺失数据中的结构。

BMJ Open. 2015 Jun 29;5(6):e007450. doi: 10.1136/bmjopen-2014-007450.

Change in body size and mortality: results from the Melbourne collaborative cohort study.身体大小变化与死亡率：墨尔本协作队列研究的结果

PLoS One. 2014 Jul 2;9(7):e99672. doi: 10.1371/journal.pone.0099672. eCollection 2014.

本文引用的文献

Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.多元缺失数据问题的多重填补：数据分析师视角

Multivariate Behav Res. 1998 Oct 1;33(4):545-71. doi: 10.1207/s15327906mbr3304_5.

A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures.队列研究中重复评估暴露测量数据缺失的报告和处理方法综述。

BMC Med Res Methodol. 2012 Jul 11;12:96. doi: 10.1186/1471-2288-12-96.

Recovery of information from multiple imputation: a simulation study.从多重填补中恢复信息：一项模拟研究。

Emerg Themes Epidemiol. 2012 Jun 13;9(1):3. doi: 10.1186/1742-7622-9-3.

Missing data: a systematic review of how they are reported and handled.缺失数据：系统综述报告及处理方法。

Epidemiology. 2012 Sep;23(5):729-32. doi: 10.1097/EDE.0b013e3182576cdb.

Reweighting estimators for Cox regression with missing covariate data: analysis of insulin resistance and risk of stroke in the Northern Manhattan Study.基于缺失协变量数据的 Cox 回归重加权估计器：在北部曼哈顿研究中胰岛素抵抗与中风风险的分析。

Stat Med. 2011 Dec 10;30(28):3328-40. doi: 10.1002/sim.4380. Epub 2011 Oct 3.

A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates.带不可忽略缺失时变协变量的贝叶斯比例风险回归模型。

Stat Med. 2010 Dec 20;29(29):3017-29. doi: 10.1002/sim.4076. Epub 2010 Oct 20.

Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey.缺失数据方法处理生活质量问卷中的缺失项。通过模拟个人均数、完全信息极大似然、多重插补和热deck 技术在法国 2003 年十年健康调查中的 SF-36 中的应用，对这些方法进行比较。

Qual Life Res. 2011 Mar;20(2):287-300. doi: 10.1007/s11136-010-9740-3. Epub 2010 Oct 1.

Body size, weight change, and risk of colon cancer.体质量、体重变化与结直肠癌风险。

Cancer Epidemiol Biomarkers Prev. 2010 Nov;19(11):2978-86. doi: 10.1158/1055-9965.EPI-10-0543. Epub 2010 Sep 24.

Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.缺失协变量值的多重插补与完全案例分析相比的偏差和效率。

Stat Med. 2010 Dec 10;29(28):2920-31. doi: 10.1002/sim.3944.

The use and reporting of multiple imputation in medical research - a review.多变量插补在医学研究中的应用与报告——综述。

J Intern Med. 2010 Dec;268(6):586-93. doi: 10.1111/j.1365-2796.2010.02274.x. Epub 2010 Sep 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

纵向队列中缺失数据对随时间变化暴露分析的影响：一项模拟研究。

The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献