• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种简单的池化方法在多重插补数据集的变量选择中表现优于复杂方法。

A simple pooling method for variable selection in multiply imputed datasets outperformed complex methods.

机构信息

Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands.

Physical Therapy Practice Panken, Roermond, The Netherlands.

出版信息

BMC Med Res Methodol. 2022 Aug 4;22(1):214. doi: 10.1186/s12874-022-01693-8.

DOI:10.1186/s12874-022-01693-8
PMID:35927610
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9351113/
Abstract

BACKGROUND

For the development of prognostic models, after multiple imputation, variable selection is advised to be applied from the pooled model. The aim of this study is to evaluate by using a simulation study and practical data example the performance of four different pooling methods for variable selection in multiple imputed datasets. These methods are the D1, D2, D3 and recently extended Median-P-Rule (MPR) for categorical, dichotomous, and continuous variables in logistic regression models.

METHODS

Four datasets (n = 200 and n = 500), with 9 variables and correlations of respectively 0.2 and 0.6 between these variables, were simulated. These datasets included 2 categorical and 2 continuous variables with 20% missing at random data. Multiple Imputation (m = 5) was applied, and the four methods were compared with selection from the full model (without missing data). The same analyzes were repeated in five multiply imputed real-world datasets (NHANES) (m = 5, p = 0.05, N = 250/300/400/500/1000).

RESULTS

In the simulated datasets, the differences between the pooling methods were most evident in the smaller datasets. The MPR performed equal to all other pooling methods for the selection frequency, as well as for the P-values of the continuous and dichotomous variables, however the MPR performed consistently better for pooling and selecting categorical variables in multiply imputed datasets and also regarding the stability of the selected prognostic models. Analyzes in the NHANES-dataset showed that all methods mostly selected the same models. Compared to each other however, the D2-method seemed to be the least sensitive and the MPR the most sensitive, most simple, and easy method to apply.

CONCLUSIONS

Considering that MPR is the most simple and easy pooling method to use for epidemiologists and applied researchers, we carefully recommend using the MPR-method to pool categorical variables with more than two levels after Multiple Imputation in combination with Backward Selection-procedures (BWS). Because MPR never performed worse than the other methods in continuous and dichotomous variables we also advice to use MPR in these types of variables.

摘要

背景

为了开发预后模型,建议在合并模型中应用变量选择。本研究的目的是通过模拟研究和实际数据示例评估四种不同的合并方法在多个插补数据集中进行变量选择的性能。这些方法是 D1、D2、D3 和最近扩展的中位数-P-规则(MPR),用于逻辑回归模型中的分类、二分类和连续变量。

方法

模拟了四个数据集(n=200 和 n=500),其中包含 9 个变量,变量之间的相关性分别为 0.2 和 0.6。这些数据集包括 2 个分类变量和 2 个连续变量,随机缺失率为 20%。应用了多重插补(m=5),并将这四种方法与完整模型(无缺失数据)的选择进行了比较。在五个多份插补的真实世界数据集(NHANES)(m=5,p=0.05,N=250/300/400/500/1000)中重复了相同的分析。

结果

在模拟数据集中,较小的数据集之间的方法差异最为明显。MPR 在选择频率以及连续和二分类变量的 P 值方面与所有其他合并方法一样,但在分类变量的多重插补数据集中以及在选择预后模型的稳定性方面,MPR 表现始终更好。NHANES 数据集的分析表明,所有方法大多选择了相同的模型。然而,彼此相比,D2 方法似乎最不敏感,而 MPR 则是最敏感、最简单、最容易应用的方法。

结论

考虑到 MPR 是最简单易用的合并方法,我们建议在多变量插补后使用 MPR 方法对具有两个以上水平的分类变量进行合并,并结合后向选择程序(BWS)。因为 MPR 在连续和二分类变量中从未表现出比其他方法更差,所以我们也建议在这些类型的变量中使用 MPR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/ca82ff15353c/12874_2022_1693_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/3f65d5e8dafe/12874_2022_1693_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/3042e635cac2/12874_2022_1693_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/ca82ff15353c/12874_2022_1693_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/3f65d5e8dafe/12874_2022_1693_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/3042e635cac2/12874_2022_1693_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e69/9351113/ca82ff15353c/12874_2022_1693_Fig3_HTML.jpg

相似文献

1
A simple pooling method for variable selection in multiply imputed datasets outperformed complex methods.一种简单的池化方法在多重插补数据集的变量选择中表现优于复杂方法。
BMC Med Res Methodol. 2022 Aug 4;22(1):214. doi: 10.1186/s12874-022-01693-8.
2
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.类别协变量在多重插补后逻辑回归模型中的显著性检验方法:功效和适用性分析。
BMC Med Res Methodol. 2017 Aug 22;17(1):129. doi: 10.1186/s12874-017-0404-7.
3
Inference following multiple imputation for generalized additive models: an investigation of the median p-value rule with applications to the Pulmonary Hypertension Association Registry and Colorado COVID-19 hospitalization data.广义加性模型的多重插补后推断:中位数 p 值规则的调查及其在肺动脉高压协会登记处和科罗拉多州 COVID-19 住院数据中的应用。
BMC Med Res Methodol. 2022 May 21;22(1):148. doi: 10.1186/s12874-022-01613-w.
4
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
5
Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.当连续结果需要二分类化进行应答者分析时的推断策略:一项模拟研究。
BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.
6
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods.多重插补数据集的变量选择:在堆叠法和分组法之间进行选择。
J Comput Graph Stat. 2022;31(4):1063-1075. doi: 10.1080/10618600.2022.2035739. Epub 2022 Mar 28.
7
Multiple imputation methods for handling missing values in a longitudinal categorical variable with restrictions on transitions over time: a simulation study.多种插补方法处理具有时间过渡限制的纵向分类变量中的缺失值:一项模拟研究。
BMC Med Res Methodol. 2019 Jan 10;19(1):14. doi: 10.1186/s12874-018-0653-0.
8
Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation.使用多重填补时变量选择策略对预后模型性能的影响。
Circ Cardiovasc Qual Outcomes. 2019 Nov;12(11):e005927. doi: 10.1161/CIRCOUTCOMES.119.005927. Epub 2019 Nov 13.
9
A real data-driven simulation strategy to select an imputation method for mixed-type trait data.一种基于真实数据驱动的选择混合类型性状数据插补方法的模拟策略。
PLoS Comput Biol. 2023 Mar 22;19(3):e1010154. doi: 10.1371/journal.pcbi.1010154. eCollection 2023 Mar.
10
Pooling test statistics across multiply imputed datasets for nonnormal items.对非正态项目进行多重插补数据集的汇总检验统计量。
Behav Res Methods. 2024 Mar;56(3):1229-1243. doi: 10.3758/s13428-023-02088-3. Epub 2023 Mar 27.

引用本文的文献

1
Disparities in Mental Health Symptoms Among Sexual and Gender Diverse Subgroups in a National Sample of College Students.全国大学生样本中性取向和性别多样化亚组心理健康症状的差异
Psychol Sex Orientat Gend Divers. 2024 Mar 14. doi: 10.1037/sgd0000714.
2
Evaluating the median -value method for assessing the statistical significance of tests when using multiple imputation.评估在使用多重填补时用于评估检验统计显著性的中位数法。
J Appl Stat. 2024 Oct 25;52(6):1161-1176. doi: 10.1080/02664763.2024.2418473. eCollection 2025.
3
ASA score is an independent predictor of 1-year outcome after moderate-to-severe traumatic brain injury.

本文引用的文献

1
Selection of variables for multivariable models: Opportunities and limitations in quantifying model stability by resampling.多变量模型变量的选择:通过重采样量化模型稳定性的机会和限制。
Stat Med. 2021 Jan 30;40(2):369-381. doi: 10.1002/sim.8779. Epub 2020 Oct 21.
2
State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues.多变量分析中变量和函数形式选择的当前技术水平——突出问题
Diagn Progn Res. 2020 Apr 2;4:3. doi: 10.1186/s41512-020-00074-3. eCollection 2020.
3
Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation.
美国麻醉医师协会(ASA)评分是中重度创伤性脑损伤后1年预后的独立预测指标。
Scand J Trauma Resusc Emerg Med. 2025 Feb 6;33(1):25. doi: 10.1186/s13049-025-01338-x.
4
Investigation of the causal relationship between patient portal utilization and patient's self-care self-efficacy and satisfaction in care among patients with cancer.癌症患者使用患者门户网站与自我护理自我效能及护理满意度之间因果关系的调查。
BMC Med Inform Decis Mak. 2025 Jan 8;25(1):12. doi: 10.1186/s12911-024-02837-0.
5
Prediction of the chance of successful immune tolerance induction in persons with severe hemophilia A and inhibitors: a clinical prediction model.预测重度甲型血友病伴抑制剂患者诱导免疫耐受成功的几率:一种临床预测模型。
Res Pract Thromb Haemost. 2024 Oct 3;8(7):102580. doi: 10.1016/j.rpth.2024.102580. eCollection 2024 Oct.
6
Exploring the Interactions Between Psychotic Symptoms, Cognition, and Environmental Risk Factors: A Bayesian Analysis of Networks.探索精神病症状、认知与环境风险因素之间的相互作用:网络的贝叶斯分析
Schizophr Bull. 2025 Jul 7;51(4):1134-1145. doi: 10.1093/schbul/sbae174.
7
Predictors of moderate-to-severe side-effects following COVID-19 mRNA booster vaccination: a prospective cohort study among primary health care providers in Belgium.预测 COVID-19 mRNA 加强针接种后出现中重度副作用的因素:比利时初级保健提供者的前瞻性队列研究。
BMC Infect Dis. 2024 Oct 10;24(1):1135. doi: 10.1186/s12879-024-09969-8.
8
Subscapular skinfold thickness, not other anthropometric and dual-energy X-ray absorptiometry-measured adiposity, is positively associated with the presence of age-related macular degeneration: a cross-sectional study from National Health and Nutrition Examination Survey 2005-2006.肩胛下皮褶厚度而非其他人体测量学指标及双能X线吸收法测量的肥胖程度,与年龄相关性黄斑变性的存在呈正相关:一项基于2005 - 2006年美国国家健康与营养检查调查的横断面研究。
BMJ Open Ophthalmol. 2024 Jul 31;9(1):e001505. doi: 10.1136/bmjophth-2023-001505.
9
A novel prediction score determining individual clinical outcome 3 months after juvenile stroke (PREDICT-score).一种预测青少年脑卒中 3 个月后个体临床结局的新评分(PREDICT 评分)。
J Neurol. 2024 Sep;271(9):6238-6246. doi: 10.1007/s00415-024-12552-5. Epub 2024 Jul 31.
10
Does pain intensity after total knee arthroplasty depend on somatosensory functioning in knee osteoarthritis patients? A prospective cohort study.全膝关节置换术后疼痛强度是否取决于膝骨关节炎患者的躯体感觉功能?一项前瞻性队列研究。
Clin Rheumatol. 2024 Jun;43(6):2047-2059. doi: 10.1007/s10067-024-06976-7. Epub 2024 Apr 26.
使用多重填补时变量选择策略对预后模型性能的影响。
Circ Cardiovasc Qual Outcomes. 2019 Nov;12(11):e005927. doi: 10.1161/CIRCOUTCOMES.119.005927. Epub 2019 Nov 13.
4
A comparison of model selection methods for prediction in the presence of multiply imputed data.存在多重填补数据时预测的模型选择方法比较
Biom J. 2019 Mar;61(2):343-356. doi: 10.1002/bimj.201700232. Epub 2018 Oct 23.
5
Variable selection - A review and recommendations for the practicing statistician.变量选择——给执业统计学家的一篇综述与建议
Biom J. 2018 May;60(3):431-449. doi: 10.1002/bimj.201700067. Epub 2018 Jan 2.
6
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis.类别协变量在多重插补后逻辑回归模型中的显著性检验方法:功效和适用性分析。
BMC Med Res Methodol. 2017 Aug 22;17(1):129. doi: 10.1186/s12874-017-0404-7.
7
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.透明报告个体预后或诊断的多变量预测模型(TRIPOD):解释和说明。
Ann Intern Med. 2015 Jan 6;162(1):W1-73. doi: 10.7326/M14-0698.
8
Missing data in a multi-item instrument were best handled by multiple imputation at the item score level.多项目量表中的缺失数据最好在项目得分层面采用多重插补处理。
J Clin Epidemiol. 2014 Mar;67(3):335-42. doi: 10.1016/j.jclinepi.2013.09.009. Epub 2013 Dec 2.
9
Analyzing longitudinal data with missing values.分析具有缺失值的纵向数据。
Rehabil Psychol. 2011 Nov;56(4):267-88. doi: 10.1037/a0025579. Epub 2011 Oct 3.
10
Multiple imputation using chained equations: Issues and guidance for practice.使用链式方程进行多重插补:实践中的问题和指导。
Stat Med. 2011 Feb 20;30(4):377-99. doi: 10.1002/sim.4067. Epub 2010 Nov 30.