Suppr超能文献

基于树的方法和参数方法在链式方程多重插补中的公平比较。

A fair comparison of tree-based and parametric methods in multiple imputation by chained equations.

机构信息

Pfizer Worldwide Research and Development, Cambridge, Massachusetts.

Department of Biostatistics, University of Kentucky, Lexington, Kentucky.

出版信息

Stat Med. 2020 Apr 15;39(8):1156-1166. doi: 10.1002/sim.8468. Epub 2020 Jan 29.

Abstract

Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree-based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well-established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree-based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree-based imputation in MICE.

摘要

多元插补的链方程方法(MICE)因其易于实施以及能够保持无偏效应估计和有效推断而成为一种主要的缺失流行病学数据插补策略。在 MICE 算法中,可以使用各种参数或非参数方法进行插补。文献表明,当变量之间存在交互作用或其他非线性效应时,基于树的非参数插补方法在偏差和覆盖范围方面优于参数方法。然而,这些研究未能提供公平的比较,因为它们没有遵循既定的建议,即在最终分析模型中(包括交互作用)的任何效应都应包含在参数插补模型中。我们通过模拟表明,在参数插补模型中正确纳入交互作用会导致更好的性能。实际上,在估计交互效应时,正确指定的参数插补和基于树的随机森林插补表现相似。参数插补导致交互效应的覆盖率略高,但置信区间比随机森林插补宽,并且需要正确指定插补模型。 流行病学家在指定 MICE 插补模型时应谨慎,本文通过在 MICE 中对参数和基于树的插补进行公平比较,为该任务提供了帮助。

相似文献

3
MISL: Multiple imputation by super learning.MISL:超级学习的多重插补。
Stat Methods Med Res. 2022 Oct;31(10):1904-1915. doi: 10.1177/09622802221104238. Epub 2022 Jun 5.

引用本文的文献

4
A predictive model for HIV-related lymphoma.一种与HIV相关淋巴瘤的预测模型。
AIDS. 2024 Sep 1;38(11):1627-1637. doi: 10.1097/QAD.0000000000003949. Epub 2024 Jun 24.

本文引用的文献

7
Multiple imputation for missing data via sequential regression trees.基于序贯回归树的缺失数据多重插补法。
Am J Epidemiol. 2010 Nov 1;172(9):1070-6. doi: 10.1093/aje/kwq260. Epub 2010 Sep 14.
10
Multiple imputation: current perspectives.多重填补:当前观点
Stat Methods Med Res. 2007 Jun;16(3):199-218. doi: 10.1177/0962280206075304.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验