Suppr超能文献

纵向队列中缺失数据对随时间变化暴露分析的影响:一项模拟研究。

The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study.

作者信息

Karahalios Amalia, Baglietto Laura, Lee Katherine J, English Dallas R, Carlin John B, Simpson Julie A

机构信息

Centre for Molecular, Environmental, Genetic, and Analytic Epidemiology, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia.

出版信息

Emerg Themes Epidemiol. 2013 Aug 19;10(1):6. doi: 10.1186/1742-7622-10-6.

Abstract

BACKGROUND

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

METHODS

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

RESULTS

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

CONCLUSIONS

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.

摘要

背景

在具有重复随访波次的纵向队列研究中,缺失数据常常引发问题。该领域的研究主要集中于对结局重复测量中的缺失数据进行分析,通常会排除暴露数据缺失的参与者。我们开展了一项模拟研究,以比较在分析两次测量的腰围与结直肠癌风险(一个完全可观察的结局)之间的关联时,完整病例分析与多重填补(MI)处理缺失数据的效果。

方法

我们生成了1000个数据集,每个数据集包含41476名个体,这些个体具有第1波次和第2波次的腰围值以及结直肠癌事件和死亡时间,以模拟墨尔本协作队列研究的数据分布。使用三种缺失数据机制,将三种缺失数据比例(15%、30%和50%)施加于第2波次的腰围上:完全随机缺失(MCAR),以及一个现实的和一个更极端的协变量依赖随机缺失(MAR)情景。我们评估了缺失数据对两项流行病学分析的影响:1)第1波次和第2波次之间腰围变化与结直肠癌风险之间的关联,并对第1波次的腰围进行了调整;2)第2波次的腰围与结直肠癌风险之间的关联,未对第1波次的腰围进行调整。

结果

在所有缺失数据情景下,我们观察到完整病例分析或MI的偏差都非常小,区间估计的覆盖范围接近名义上的95%水平。当腰围作为一个强辅助变量纳入填补模型时,MI在精度上有所提高。

结论

这项基于纵向队列研究数据的模拟研究表明,当数据为MCAR或依赖协变量缺失时,对于感兴趣的暴露存在高达50%的缺失数据,与完整病例分析相比,进行MI几乎没有什么优势。如果在填补模型中纳入一个不在分析模型中的强辅助变量,MI将在精度上有所提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d68/3751092/0075a56daf4c/1742-7622-10-6-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验