在创建队列时使用并作为分析模型中独立变量使用的变量进行插补-排除与排除-插补：经验教训。

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model.

机构信息

ICES, Toronto, Ontario, Canada.

Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.

出版信息

Stat Med. 2023 May 10;42(10):1525-1541. doi: 10.1002/sim.9685. Epub 2023 Feb 19.

DOI:10.1002/sim.9685

PMID:36807923

Abstract

We examined the setting in which a variable that is subject to missingness is used both as an inclusion/exclusion criterion for creating the analytic sample and subsequently as the primary exposure in the analysis model that is of scientific interest. An example is cancer stage, where patients with stage IV cancer are often excluded from the analytic sample, and cancer stage (I to III) is an exposure variable in the analysis model. We considered two analytic strategies. The first strategy, referred to as "exclude-then-impute," excludes subjects for whom the observed value of the target variable is equal to the specified value and then uses multiple imputation to complete the data in the resultant sample. The second strategy, referred to as "impute-then-exclude," first uses multiple imputation to complete the data and then excludes subjects based on the observed or filled-in values in the completed samples. Monte Carlo simulations were used to compare five methods (one based on "exclude-then-impute" and four based on "impute-then-exclude") along with the use of a complete case analysis. We considered both missing completely at random and missing at random missing data mechanisms. We found that an impute-then-exclude strategy using substantive model compatible fully conditional specification tended to have superior performance across 72 different scenarios. We illustrated the application of these methods using empirical data on patients hospitalized with heart failure when heart failure subtype was used for cohort creation (excluding subjects with heart failure with preserved ejection fraction) and was also an exposure in the analysis model.

摘要

我们考察了一种情况，即一个存在缺失值的变量既被用作创建分析样本的纳入/排除标准，又被用作分析模型中具有科学意义的主要暴露因素。例如癌症分期，患有 IV 期癌症的患者通常会被排除在分析样本之外，而癌症分期（I 期至 III 期）则是分析模型中的一个暴露变量。我们考虑了两种分析策略。第一种策略称为“排除后插补”，它排除了目标变量的观测值等于指定值的受试者，然后使用多重插补来完成结果样本中的数据。第二种策略称为“插补后排除”，首先使用多重插补来完成数据，然后根据完成样本中的观测值或填充值排除受试者。我们使用蒙特卡罗模拟比较了五种方法（一种基于“排除后插补”，四种基于“插补后排除”）以及完整病例分析的使用情况。我们考虑了完全随机缺失和随机缺失缺失数据机制。我们发现，在 72 种不同的情况下，基于实质性模型兼容完全条件规范的插补后排除策略往往具有更好的性能。我们使用心力衰竭住院患者的实证数据说明了这些方法的应用，心力衰竭亚型用于队列创建（排除射血分数保留的心力衰竭患者），并且也是分析模型中的一个暴露因素。

相似文献

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model.

Stat Med. 2023 May 10;42(10):1525-1541. doi: 10.1002/sim.9685. Epub 2023 Feb 19.

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.

BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.

Dealing with missing delirium assessments in prospective clinical studies of the critically ill: a simulation study and reanalysis of two delirium studies.

BMC Med Res Methodol. 2021 May 6;21(1):97. doi: 10.1186/s12874-021-01274-1.

Multiple imputation for missing values through conditional Semiparametric odds ratio models.

Biometrics. 2011 Sep;67(3):799-809. doi: 10.1111/j.1541-0420.2010.01538.x. Epub 2011 Jan 6.

Multiple imputation with missing data indicators.

Stat Methods Med Res. 2021 Dec;30(12):2685-2700. doi: 10.1177/09622802211047346. Epub 2021 Oct 13.

Logistic regression vs. predictive mean matching for imputing binary covariates.

Stat Methods Med Res. 2023 Nov;32(11):2172-2183. doi: 10.1177/09622802231198795. Epub 2023 Sep 26.

Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study.

Stat Med. 2012 Dec 30;31(30):4164-74. doi: 10.1002/sim.5445. Epub 2012 Jul 24.

Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values.

Circ Cardiovasc Qual Outcomes. 2021 Sep;14(9):e007071. doi: 10.1161/CIRCOUTCOMES.120.007071. Epub 2021 Sep 14.

Evaluation of approaches for multiple imputation of three-level data.

BMC Med Res Methodol. 2020 Aug 12;20(1):207. doi: 10.1186/s12874-020-01079-8.

Multiple imputation to deal with missing EQ-5D-3L data: Should we impute individual domains or the actual index?

Qual Life Res. 2015 Apr;24(4):805-15. doi: 10.1007/s11136-014-0837-y. Epub 2014 Dec 4.

引用本文的文献

POP-REFINE: A Comprehensive Framework for Evaluating and Optimizing Representativeness in Clinical Trials.

Clin Pharmacol Ther. 2025 Apr;117(4):1051-1060. doi: 10.1002/cpt.3543. Epub 2024 Dec 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在创建队列时使用并作为分析模型中独立变量使用的变量进行插补-排除与排除-插补：经验教训。

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model.

机构信息

ICES, Toronto, Ontario, Canada.

Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.

出版信息

Stat Med. 2023 May 10;42(10):1525-1541. doi: 10.1002/sim.9685. Epub 2023 Feb 19.

DOI:10.1002/sim.9685

PMID:36807923

Abstract

摘要

在创建队列时使用并作为分析模型中独立变量使用的变量进行插补-排除与排除-插补：经验教训。

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

在创建队列时使用并作为分析模型中独立变量使用的变量进行插补-排除与排除-插补：经验教训。

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model.

机构信息

出版信息

相似文献

引用本文的文献