• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

存在多重填补数据时预测的模型选择方法比较

A comparison of model selection methods for prediction in the presence of multiply imputed data.

作者信息

Thao Le Thi Phuong, Geskus Ronald

机构信息

Biostatistics group, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.

Nuffield Department of Medicine, University of Oxford, Oxford, UK.

出版信息

Biom J. 2019 Mar;61(2):343-356. doi: 10.1002/bimj.201700232. Epub 2018 Oct 23.

DOI:10.1002/bimj.201700232
PMID:30353591
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6492211/
Abstract

Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. ) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1-se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1-se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets.

摘要

在构建预后模型时,已经提出了许多用于对多重填补数据进行变量选择的方法。然而,没有一种方法能始终如一地成为最佳方法。我们进行了一项模拟研究,采用二元结局和逻辑回归模型,以比较在存在多重填补数据的情况下两类变量选择方法:(I)对自抽样数据进行模型选择,使用基于AIC或套索的向后消除法,并根据在所有多重填补和自抽样数据集中最常(例如 )选择的变量拟合最终模型;(II)对原始多重填补数据进行模型选择,使用套索法。最终模型通过以下方式获得:(i)对在任何多重填补数据集中选择的变量估计值求平均值,或(ii)在50%的多重填补数据中选择的变量估计值求平均值;(iii)对堆叠的多重填补数据执行套索法,以及(iv)与(iii)相同,但使用由缺失率确定的个体权重。在所有套索模型中,我们同时使用了最优惩罚和1标准误规则。我们考虑通过重新拟合线性预测器或所有个体变量来重新校准模型,以纠正由于次优惩罚导致的过度收缩。我们将这些方法应用于一个包含951例成年结核性脑膜炎患者的真实数据集,以预测九个月内的死亡率。总体而言,在方法I和方法II中,应用带有1标准误惩罚的套索选择法都表现出最佳性能。堆叠多重填补数据是一种有吸引力的方法,因为在合并来自单独多重填补数据集的结果时,它不需要选择选择阈值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/39944fdae21c/BIMJ-61-343-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/7fa4caf2e7d8/BIMJ-61-343-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/d116ae5233bf/BIMJ-61-343-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/fc9a9100ba03/BIMJ-61-343-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/39944fdae21c/BIMJ-61-343-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/7fa4caf2e7d8/BIMJ-61-343-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/d116ae5233bf/BIMJ-61-343-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/fc9a9100ba03/BIMJ-61-343-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352b/6492211/39944fdae21c/BIMJ-61-343-g004.jpg

相似文献

1
A comparison of model selection methods for prediction in the presence of multiply imputed data.存在多重填补数据时预测的模型选择方法比较
Biom J. 2019 Mar;61(2):343-356. doi: 10.1002/bimj.201700232. Epub 2018 Oct 23.
2
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
3
Validation of prediction models based on lasso regression with multiply imputed data.基于套索回归与多重填补数据的预测模型验证
BMC Med Res Methodol. 2014 Oct 16;14:116. doi: 10.1186/1471-2288-14-116.
4
Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods.多重插补数据集的变量选择:在堆叠法和分组法之间进行选择。
J Comput Graph Stat. 2022;31(4):1063-1075. doi: 10.1080/10618600.2022.2035739. Epub 2022 Mar 28.
5
Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study.与向后变量消除法相比,自举模型选择在选择真实变量和噪声变量方面具有相似的性能:一项模拟研究。
J Clin Epidemiol. 2008 Oct;61(10):1009-17.e1. doi: 10.1016/j.jclinepi.2007.11.014. Epub 2008 Jun 9.
6
Model selection of generalized estimating equations with multiply imputed longitudinal data.具有多重填补纵向数据的广义估计方程的模型选择
Biom J. 2013 Nov;55(6):899-911. doi: 10.1002/bimj.201200236. Epub 2013 Aug 23.
7
Graphical modeling of binary data using the LASSO: a simulation study.使用 LASSO 对二元数据进行图形建模:一项模拟研究。
BMC Med Res Methodol. 2012 Feb 21;12:16. doi: 10.1186/1471-2288-12-16.
8
A simple pooling method for variable selection in multiply imputed datasets outperformed complex methods.一种简单的池化方法在多重插补数据集的变量选择中表现优于复杂方法。
BMC Med Res Methodol. 2022 Aug 4;22(1):214. doi: 10.1186/s12874-022-01693-8.
9
A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data.一种用于大规模医疗保健数据库研究中变量选择的灵活方法,该研究存在协变量和结果数据缺失的情况。
BMC Med Res Methodol. 2022 May 4;22(1):132. doi: 10.1186/s12874-022-01608-7.
10
Analyzing evidence-based falls prevention data with significant missing information using variable selection after multiple imputation.在多次插补后使用变量选择分析存在大量缺失信息的循证预防跌倒数据。
J Appl Stat. 2021 Oct 7;50(3):724-743. doi: 10.1080/02664763.2021.1985090. eCollection 2023.

引用本文的文献

1
Resilience and its association with caregiving and psychosocial factors among lung cancer caregivers in Vietnam.越南肺癌患者照料者的心理韧性及其与照料负担和社会心理因素的关联
Eur J Oncol Nurs. 2025 Aug;77:102932. doi: 10.1016/j.ejon.2025.102932. Epub 2025 Jul 16.
2
Variable selection methods for descriptive modeling.用于描述性建模的变量选择方法。
PLoS One. 2025 Jun 2;20(6):e0321601. doi: 10.1371/journal.pone.0321601. eCollection 2025.
3
Psychiatric Medication Treatment, Concurrent Substance Use, and Subsistence Difficulty Among People Who Inject Drugs with Diagnosed Mental Health Disorders in Los Angeles and Denver.

本文引用的文献

1
A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures.一种基于模拟的方法,用于在常见变量选择程序之后评估逻辑回归模型的统计显著性。
Commun Stat Simul Comput. 2017;46(9):7180-7193. doi: 10.1080/03610918.2016.1230216. Epub 2016 Sep 30.
2
Variable Selection in the Presence of Missing Data: Imputation-based Methods.存在缺失数据时的变量选择:基于插补的方法。
Wiley Interdiscip Rev Comput Stat. 2017 Sep-Oct;9(5). doi: 10.1002/wics.1402. Epub 2017 May 24.
3
Prognostic Models for 9-Month Mortality in Tuberculous Meningitis.
洛杉矶和丹佛患有已确诊精神健康障碍的注射吸毒者的精神科药物治疗、并发物质使用及生活困难情况
Subst Use Misuse. 2025;60(10):1556-1564. doi: 10.1080/10826084.2025.2506135. Epub 2025 May 20.
4
The PROgnostic ModEl for chronic lung disease (PRO-MEL): development and temporal validation.慢性肺病预后模型(PRO-MEL):建立和时间验证。
BMC Pulm Med. 2024 Aug 30;24(1):429. doi: 10.1186/s12890-024-03233-0.
5
Simple severity scale for perforated peptic ulcer with generalized peritonitis: a derivation and internal validation study.伴有弥漫性腹膜炎的穿孔性消化性溃疡简易严重程度量表:一项推导与内部验证研究
Int J Surg. 2024 Nov 1;110(11):7134-7141. doi: 10.1097/JS9.0000000000002037.
6
Insufficient duration of insecticidal efficacy of Yahe insecticide-treated nets in Papua New Guinea.雅禾杀虫剂处理过的蚊帐在巴布亚新几内亚的杀虫功效持续时间不足。
Malar J. 2024 Jun 5;23(1):175. doi: 10.1186/s12936-024-05005-x.
7
Demographic and Behavioral Differences Between Adolescents and Young Adults Who Use E-Cigarettes at Low and High Frequency.青少年和青年电子烟低频使用者与高频使用者的人口统计学和行为差异。
Subst Use Addctn J. 2024 Apr;45(2):232-239. doi: 10.1177/29767342231214115. Epub 2024 Jan 2.
8
Development and validation of a work-related risk score for upper-extremity musculoskeletal disorders in a French working population.法国工作人群上肢肌肉骨骼疾病工作相关风险评分的开发与验证
Scand J Work Environ Health. 2023 Nov 1;49(8):558-568. doi: 10.5271/sjweh.4119. Epub 2023 Sep 6.
9
Prognosis and prediction of antibiotic benefit in adults with clinically diagnosed acute rhinosinusitis: an individual participant data meta-analysis.临床诊断为急性鼻-鼻窦炎的成人患者使用抗生素的预后及疗效预测:一项个体参与者数据的荟萃分析
Diagn Progn Res. 2023 Sep 5;7(1):16. doi: 10.1186/s41512-023-00154-0.
10
Hyperglycemia, Ischemic Lesions, and Functional Outcomes After Intracerebral Hemorrhage.高血糖与脑出血后的缺血性损伤及功能结局
J Am Heart Assoc. 2023 Jul 4;12(13):e028632. doi: 10.1161/JAHA.122.028632. Epub 2023 Jun 22.
结核性脑膜炎 9 个月死亡率的预测模型。
Clin Infect Dis. 2018 Feb 1;66(4):523-532. doi: 10.1093/cid/cix849.
4
Intensified Antituberculosis Therapy in Adults with Tuberculous Meningitis.成人结核性脑膜炎的强化抗结核治疗。
N Engl J Med. 2016 Jan 14;374(2):124-34. doi: 10.1056/NEJMoa1507062.
5
Variable selection models based on multiple imputation with an application for predicting median effective dose and maximum effect.基于多重填补的变量选择模型及其在预测半数有效剂量和最大效应中的应用。
J Stat Comput Simul. 2015;85(9):1902-1916. doi: 10.1080/00949655.2014.907801.
6
Variable selection in the presence of missing data: resampling and imputation.存在缺失数据时的变量选择:重采样与插补
Biostatistics. 2015 Jul;16(3):596-610. doi: 10.1093/biostatistics/kxv003. Epub 2015 Feb 18.
7
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.每个变量的事件数(EPV)以及评估逻辑回归模型样本外有效性的不同策略的相对性能。
Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19.
8
Validation of prediction models based on lasso regression with multiply imputed data.基于套索回归与多重填补数据的预测模型验证
BMC Med Res Methodol. 2014 Oct 16;14:116. doi: 10.1186/1471-2288-14-116.
9
Variable selection for multiply-imputed data with application to dioxin exposure study.具有应用于二恶英暴露研究的多重插补数据的变量选择。
Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.
10
Randomized pharmacokinetic and pharmacodynamic comparison of fluoroquinolones for tuberculous meningitis.氟喹诺酮类药物治疗结核性脑膜炎的随机药代动力学和药效学比较。
Antimicrob Agents Chemother. 2011 Jul;55(7):3244-53. doi: 10.1128/AAC.00064-11. Epub 2011 Apr 18.