在大型流行病学研究中采用全条件设定多重填补法处理缺失数据

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study.

作者信息

Liu Yang, De Anindya

机构信息

Division of Analysis, Research, and Practice Integration, National Center for Injury Prevention and Control, U.S. Centers for Disease Control and Prevention, Atlanta, GA 30341, USA; Division of Global HIV/AIDS, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, Georgia, 30333, USA.

Division of Global HIV/AIDS, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, Georgia, 30333, USA.

出版信息

Int J Stat Med Res. 2015;4(3):287-295. doi: 10.6000/1929-6029.2015.04.03.7. Epub 2015 Aug 19.

DOI:10.6000/1929-6029.2015.04.03.7

PMID:27429686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4945131/

Abstract

Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.

摘要

缺失数据在大型流行病学研究中普遍存在。忽略数据不完整性或不恰当地处理数据可能会使研究结果产生偏差，降低效能和效率，并改变重要的风险/效益关系。由于精度损失和偏差风险，处理缺失值的标准方法，如完全病例分析（CCA），通常并不适用。通过全条件设定进行多重填补（FCS MI）是一种强大且具有统计学有效性的方法，可用于在包含分类变量和连续变量的大型数据集中创建填补值。它在逐个变量的基础上指定多变量填补模型，并提供了一种有原则且灵活的处理缺失数据的方法，这对于具有复杂数据结构的大型数据集尤为有用。然而，FCS MI在流行病学中仍然很少使用，并且几乎没有实际资源可指导研究人员实施这项技术。我们展示了FCS MI在支持一项评估撒哈拉以南非洲国家全国血液使用模式的大型流行病学研究中的应用。基于这一经验，描述了一些实施FCS MI的实用技巧和指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f032/4945131/e6d1b417981b/nihms795806f1.jpg

相似文献

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study.在大型流行病学研究中采用全条件设定多重填补法处理缺失数据

Int J Stat Med Res. 2015;4(3):287-295. doi: 10.6000/1929-6029.2015.04.03.7. Epub 2015 Aug 19.

A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study.存在与时间呈非线性关联的时变协变量时，用于处理纵向数据中缺失值的多种多重填补方法的比较：一项模拟研究。

BMC Med Res Methodol. 2017 Jul 25;17(1):114. doi: 10.1186/s12874-017-0372-y.

Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation.缺失数据的多重插补：完全条件指定与多元正态插补。

Am J Epidemiol. 2010 Mar 1;171(5):624-32. doi: 10.1093/aje/kwp425. Epub 2010 Jan 27.

Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值：一项模拟研究。

BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.

Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data.纵向电子健康记录数据的双重完全条件指定多重填补法评估

Stat Med. 2014 Sep 20;33(21):3725-37. doi: 10.1002/sim.6184. Epub 2014 Apr 30.

Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable.在存在由潜在连续变量创建的不完整二元变量的情况下进行多重填补。

Biom J. 2020 Mar;62(2):467-478. doi: 10.1002/bimj.201900011. Epub 2019 Jul 15.

Multiple imputation methods for missing multilevel ordinal outcomes.缺失多水平有序结局的多重插补方法。

BMC Med Res Methodol. 2023 May 9;23(1):112. doi: 10.1186/s12874-023-01909-5.

Two-stage multiple imputation with a longitudinal composite variable.使用纵向复合变量的两阶段多重填补法。

BMC Med Res Methodol. 2025 May 6;25(1):124. doi: 10.1186/s12874-025-02555-9.

Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables.多元纵向混合缺失数据插补方法的评价与研究

Stat Med. 2022 Dec 30;41(30):5844-5876. doi: 10.1002/sim.9592. Epub 2022 Oct 11.

引用本文的文献

Sex Differences in Excess Mortality Among Waitlisted Kidney, Heart, and Liver Transplant Candidates.等待肾、心脏和肝脏移植的候选者中过高死亡率的性别差异。

Transplant Direct. 2025 Aug 22;11(9):e1856. doi: 10.1097/TXD.0000000000001856. eCollection 2025 Sep.

A prospective study of semen quality and fecundability among North American couples planning pregnancy.一项针对北美计划怀孕夫妇精液质量和受孕能力的前瞻性研究。

Andrology. 2025 Jul 11. doi: 10.1111/andr.70084.

Residential Proximity, Duration, and Health-Related Quality of Life: Insights from the Fernald Cohort.居住距离、时长与健康相关生活质量：来自费纳德队列研究的见解

Int J Environ Res Public Health. 2025 May 7;22(5):738. doi: 10.3390/ijerph22050738.

Development and Validation of Models to Estimate the Incident Risk of Cognitive Impairment and Atherosclerotic Cardiovascular Disease in Older Adults.老年人认知障碍和动脉粥样硬化性心血管疾病发病风险估计模型的开发与验证

J Am Heart Assoc. 2025 Jun 3;14(11):e038949. doi: 10.1161/JAHA.124.038949. Epub 2025 May 22.

Center Volume Not Associated with Survival Benefit of Inter-Hospital Transfer for Pediatric Cardiac Surgery.中心容量与小儿心脏手术院际转运的生存获益无关。

Pediatr Cardiol. 2025 May 20. doi: 10.1007/s00246-025-03881-x.

The influence of socioeconomic status on the association between residential greenness and gestational diabetes mellitus in an urban setting: a multicenter study.城市环境中社会经济地位对居住绿地与妊娠期糖尿病关联的影响：一项多中心研究

BMC Public Health. 2025 May 8;25(1):1708. doi: 10.1186/s12889-025-22913-y.

Association between walking and hip fracture in women aged 65 and older: 20-year follow-up from the study of osteoporotic fractures.65岁及以上女性步行与髋部骨折之间的关联：骨质疏松性骨折研究的20年随访

Osteoporos Int. 2025 May 7. doi: 10.1007/s00198-025-07508-y.

Diet and Risk for Incident Diverticulitis in Women : A Prospective Cohort Study.饮食与女性憩室炎发病风险：一项前瞻性队列研究

Ann Intern Med. 2025 Jun;178(6):788-795. doi: 10.7326/ANNALS-24-03353. Epub 2025 May 6.

Experiences of discrimination across the life course among pregnancy planners in the United States and Canada.美国和加拿大孕期计划者在人生历程中的歧视经历。

SSM Popul Health. 2025 Apr 11;30:101803. doi: 10.1016/j.ssmph.2025.101803. eCollection 2025 Jun.

Assessing patterns of chronic kidney disease care in Australian primary care: a retrospective cohort study of a national general practice dataset.评估澳大利亚初级医疗中慢性肾脏病的护理模式：一项基于全国全科医疗数据集的回顾性队列研究。

Lancet Reg Health West Pac. 2025 Apr 10;57:101541. doi: 10.1016/j.lanwpc.2025.101541. eCollection 2025 Apr.

本文引用的文献

Blood component use in a sub-Saharan African country: results of a 4-year evaluation of diagnoses associated with transfusion orders in Namibia.撒哈拉以南非洲国家的血液成分使用情况：纳米比亚输血医嘱相关诊断的4年评估结果

Transfus Med Rev. 2015 Jan;29(1):45-51. doi: 10.1016/j.tmrv.2014.11.003. Epub 2014 Nov 27.

Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model.通过完全条件设定对协变量进行多重填补：适配实质性模型。

Stat Methods Med Res. 2015 Aug;24(4):462-87. doi: 10.1177/0962280214521348. Epub 2014 Feb 12.

Estimation of the prevalence and rate of acute transfusion reactions occurring in Windhoek, Namibia.纳米比亚温得和克急性输血反应发生率及流行率的估计。

Blood Transfus. 2014 Jul;12(3):352-61. doi: 10.2450/2013.0143-13. Epub 2013 Nov 15.

Strategies for dealing with missing data in clinical trials: from design to analysis.临床试验中缺失数据的处理策略：从设计到分析。

Yale J Biol Med. 2013 Sep 20;86(3):343-58. eCollection 2013 Sep.

Recovery of information from multiple imputation: a simulation study.从多重填补中恢复信息：一项模拟研究。

Emerg Themes Epidemiol. 2012 Jun 13;9(1):3. doi: 10.1186/1742-7622-9-3.

Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data.适用于不完全数据的流行病学研究中多次插补后进行敏感性分析的实用考虑因素。

BMC Med Res Methodol. 2012 Jun 8;12:73. doi: 10.1186/1471-2288-12-73.

State of the Multiple Imputation Software.多重填补软件的现状。

J Stat Softw. 2011 Dec;45(1). doi: 10.18637/jss.v045.i01.

Multiple imputation by chained equations: what is it and how does it work?多重链结方程插补法：是什么，以及它如何运作？

Int J Methods Psychiatr Res. 2011 Mar;20(1):40-9. doi: 10.1002/mpr.329.

Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values.缺失协变量值的多重插补与完全案例分析相比的偏差和效率。

Stat Med. 2010 Dec 10;29(28):2920-31. doi: 10.1002/sim.3944.

Missing data analysis using multiple imputation: getting to the heart of the matter.使用多重填补法进行缺失数据分析：抓住问题的核心。

Circ Cardiovasc Qual Outcomes. 2010 Jan;3(1):98-105. doi: 10.1161/CIRCOUTCOMES.109.875658.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在大型流行病学研究中采用全条件设定多重填补法处理缺失数据

Multiple Imputation by Fully Conditional Specification for Dealing with Missing Data in a Large Epidemiologic Study.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献