在一项预后研究中使用自抽样法在多重填补下进行变量选择。

Variable selection under multiple imputation using the bootstrap in a prognostic study.

作者信息

Heymans Martijn W, van Buuren Stef, Knol Dirk L, van Mechelen Willem, de Vet Henrica C W

机构信息

Vrije Universiteit, Institute for Health Sciences, Department of Methodology and Applied Biostatistics, Amsterdam, The Netherlands.

出版信息

BMC Med Res Methodol. 2007 Jul 13;7:33. doi: 10.1186/1471-2288-7-33.

DOI:10.1186/1471-2288-7-33

PMID:17629912

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1945032/

Abstract

BACKGROUND

Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection.

METHOD

In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels.

RESULTS

We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found.

CONCLUSION

We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.

摘要

背景

在许多预后研究中，缺失数据是一个具有挑战性的问题。多重填补（MI）考虑了填补的不确定性，从而能够进行充分的统计检验。我们开发并测试了一种将多重填补与自抽样技术相结合的方法，用于研究预后变量的选择。

方法

在我们的前瞻性队列研究中，我们合并了来自三项不同随机对照试验（RCT）的数据，以评估下腰痛慢性化的预后变量。在结局变量和预后变量中，数据缺失率在0%至48.1%之间。我们使用四种方法分别研究抽样和填补变异的影响：仅多重填补、仅自抽样，以及两种将多重填补和自抽样相结合的方法。根据每个预后变量的纳入频率（即该变量出现在模型中的比例）来选择变量。在不同的纳入水平下，评估了由这四种方法开发的预后模型的判别能力和校准能力。

结果

我们发现，填补变异对纳入频率的影响大于抽样变异的影响。当在0%（全模型）至90%的变量选择范围内将多重填补和自抽样相结合时，发现自抽样校正的c指数值在0.70至0.71之间，斜率值在0.64至0.86之间。

结论

我们建议在缺失数据集时同时考虑填补变异和抽样变异。将多重填补与自抽样相结合进行变量选择的新程序，可产生具有良好性能的多变量预后模型，因此对于应用于存在缺失值的数据集具有吸引力。

相似文献

Variable selection under multiple imputation using the bootstrap in a prognostic study.

BMC Med Res Methodol. 2007 Jul 13;7:33. doi: 10.1186/1471-2288-7-33.

Missing data and imputation: a practical illustration in a prognostic study on low back pain.

J Manipulative Physiol Ther. 2012 Jul;35(6):464-71. doi: 10.1016/j.jmpt.2012.07.002.

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods.

BMC Med Res Methodol. 2006 Dec 13;6:57. doi: 10.1186/1471-2288-6-57.

Rounding strategies for multiply imputed binary data.

Biom J. 2009 Aug;51(4):677-88. doi: 10.1002/bimj.200900018.

Imputation strategies for missing continuous outcomes in cluster randomized trials.

Biom J. 2008 Jun;50(3):329-45. doi: 10.1002/bimj.200710423.

Using the outcome for imputation of missing predictor values was preferred.

J Clin Epidemiol. 2006 Oct;59(10):1092-101. doi: 10.1016/j.jclinepi.2006.01.009. Epub 2006 Jun 19.

The impact of using different imputation methods for missing quality of life scores on the estimation of the cost-effectiveness of lung-volume-reduction surgery.

Health Econ. 2009 Jan;18(1):91-101. doi: 10.1002/hec.1347.

Multidisciplinary rehabilitation treatment of patients with chronic low back pain: a prognostic model for its outcome.

Clin J Pain. 2008 Jun;24(5):421-30. doi: 10.1097/AJP.0b013e31816719f5.

Multiple imputation for missing income data in population-based health surveillance.

J Public Health Manag Pract. 2009 Nov-Dec;15(6):E12-21. doi: 10.1097/PHH.0b013e3181aab5f7.

Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study.

J Clin Epidemiol. 2008 Oct;61(10):1009-17.e1. doi: 10.1016/j.jclinepi.2007.11.014. Epub 2008 Jun 9.

引用本文的文献

Cardiogenic Shock Risk Score at Diagnosis of Multisystem Inflammatory Syndrome in Children: A Multicenter Study.

Pediatr Cardiol. 2025 Mar 12. doi: 10.1007/s00246-025-03823-7.

Facility-Level Variation in Nephrology Care among Veterans after Urinary Stone Diagnosis.

Kidney360. 2025 Feb 1;6(2):296-302. doi: 10.34067/KID.0000000639. Epub 2024 Nov 19.

Pleural effusion in acute pulmonary embolism: characteristics and relevance.

BMJ Open Respir Res. 2024 Nov 13;11(1):e002179. doi: 10.1136/bmjresp-2023-002179.

Cluster effect for SNP-SNP interaction pairs for predicting complex traits.

Sci Rep. 2024 Aug 12;14(1):18677. doi: 10.1038/s41598-024-66311-7.

Predictions for functional outcome and mortality in acute ischaemic stroke following successful endovascular thrombectomy.

BMJ Neurol Open. 2024 Jun 25;6(1):e000707. doi: 10.1136/bmjno-2024-000707. eCollection 2024.

Invited commentary: mixing multiple imputation and bootstrapping for variance estimation.

Am J Epidemiol. 2024 Oct 7;193(10):1477-1481. doi: 10.1093/aje/kwae065.

Flexible variable selection in the presence of missing data.

Int J Biostat. 2024 Feb 13;20(2):347-359. doi: 10.1515/ijb-2023-0059. eCollection 2024 Nov 1.

A comparison of strategies for selecting auxiliary variables for multiple imputation.

Biom J. 2024 Jan;66(1):e2200291. doi: 10.1002/bimj.202200291.

Usefulness of a predictive model to hospitalize patients with low-risk community-acquired pneumonia.

Eur J Clin Microbiol Infect Dis. 2024 Jan;43(1):61-71. doi: 10.1007/s10096-023-04683-w. Epub 2023 Nov 8.

Multi-omics regulatory network inference in the presence of missing data.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad309.

本文引用的文献

The effectiveness of graded activity for low back pain in occupational healthcare.

Occup Environ Med. 2006 Nov;63(11):718-25. doi: 10.1136/oem.2005.021675. Epub 2006 Jul 17.

The effectiveness of high-intensity versus low-intensity back schools in an occupational setting: a pragmatic randomized controlled trial.

Spine (Phila Pa 1976). 2006 May 1;31(10):1075-82. doi: 10.1097/01.brs.0000216443.46783.4d.

Investigation on the improvement of prediction by bootstrap model averaging.

Methods Inf Med. 2006;45(1):44-50.

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes.

Int J Epidemiol. 2005 Feb;34(1):89-99. doi: 10.1093/ije/dyh297. Epub 2004 Aug 27.

Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines.

Br J Cancer. 2004 Jul 5;91(1):4-8. doi: 10.1038/sj.bjc.6601907.

Graded activity for low back pain in occupational health care: a randomized, controlled trial.

Ann Intern Med. 2004 Jan 20;140(2):77-84. doi: 10.7326/0003-4819-140-2-200401200-00007.

Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation.

Stat Med. 2003 Feb 28;22(4):639-59. doi: 10.1002/sim.1310.

Developing a prognostic model in the presence of missing data: an ovarian cancer case study.

J Clin Epidemiol. 2003 Jan;56(1):28-37. doi: 10.1016/s0895-4356(02)00539-5.

Psychometric properties of the Tampa Scale for kinesiophobia and the fear-avoidance beliefs questionnaire in acute low back pain.

Man Ther. 2003 Feb;8(1):29-36. doi: 10.1054/math.2002.0484.

Validity of prognostic models: when is a model clinically useful?

Semin Urol Oncol. 2002 May;20(2):96-107. doi: 10.1053/suro.2002.32521.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在一项预后研究中使用自抽样法在多重填补下进行变量选择。

Variable selection under multiple imputation using the bootstrap in a prognostic study.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献