• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

二元预测模型在高相关性低维环境中的性能:方法比较

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods.

作者信息

Leeuwenberg Artuur M, van Smeden Maarten, Langendijk Johannes A, van der Schaaf Arjen, Mauer Murielle E, Moons Karel G M, Reitsma Johannes B, Schuit Ewoud

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.

Department of Radiation Oncology, University Medical Center Groningen, Groningen University, Groningen, The Netherlands.

出版信息

Diagn Progn Res. 2022 Jan 11;6(1):1. doi: 10.1186/s41512-021-00115-5.

DOI:10.1186/s41512-021-00115-5
PMID:35016734
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8751246/
Abstract

BACKGROUND

Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate.

METHODS

We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations.

RESULTS

In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout.

CONCLUSIONS

Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

摘要

背景

临床预测模型在各个医学学科中广泛开发。当此类模型中的预测变量高度共线时,可能会出现意外或虚假的预测变量 - 结果关联,从而可能降低预测模型的表面效度。可以通过排除共线预测变量来处理共线性,但当没有先验动机(除共线性外)来纳入或排除特定预测变量时,这种方法是任意的,可能不合适。

方法

我们比较了处理共线性的不同方法,包括收缩、降维和约束优化。通过模拟说明了这些方法的有效性。

结果

在进行的模拟中,未观察到共线性对各方法的预测结果(AUC、R、截距、斜率)有影响。然而,发现共线性对预测变量选择的稳定性有负面影响,影响所有比较的方法,但对执行强预测变量选择的方法(例如套索)影响尤其明显。在共线性增加的情况下,所纳入的预测变量集保持最稳定的方法是岭回归、主成分逻辑回归、局部自适应弹性网回归和随机失活。

结论

基于这些结果,我们建议在存在高共线性的情况下避免使用数据驱动的预测变量选择方法,因为预测变量选择的不稳定性增加,即使在相对较高的每变量事件设置中也是如此。选择某些预测变量而非其他预测变量可能会不成比例地给人一种印象,即纳入的预测变量与结果的关联比排除的预测变量更强。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/e74b9e9200d9/41512_2021_115_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/b3f2a4395131/41512_2021_115_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/78f71e8fac0b/41512_2021_115_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/80074090ca9d/41512_2021_115_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/4e8a52983aac/41512_2021_115_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/8850ea9b76df/41512_2021_115_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/94cc6a1ea704/41512_2021_115_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/71ef53b61956/41512_2021_115_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/e74b9e9200d9/41512_2021_115_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/b3f2a4395131/41512_2021_115_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/78f71e8fac0b/41512_2021_115_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/80074090ca9d/41512_2021_115_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/4e8a52983aac/41512_2021_115_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/8850ea9b76df/41512_2021_115_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/94cc6a1ea704/41512_2021_115_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/71ef53b61956/41512_2021_115_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1de4/8751246/e74b9e9200d9/41512_2021_115_Fig8_HTML.jpg

相似文献

1
Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods.二元预测模型在高相关性低维环境中的性能:方法比较
Diagn Progn Res. 2022 Jan 11;6(1):1. doi: 10.1186/s41512-021-00115-5.
2
Optimal population prediction of sandhill crane recruitment based on climate-mediated habitat limitations.基于气候介导的栖息地限制对沙丘鹤补充数量的最优种群预测。
J Anim Ecol. 2015 Sep;84(5):1299-310. doi: 10.1111/1365-2656.12370. Epub 2015 May 18.
3
Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods.葡萄酒年份预测的先进预测方法:第一部分 - 基于变量选择、惩罚回归、潜在变量和基于树的集成方法的单块回归方法的比较研究。
Talanta. 2017 Aug 15;171:341-350. doi: 10.1016/j.talanta.2016.10.062. Epub 2016 Nov 9.
4
A Study of Effects of MultiCollinearity in the Multivariable Analysis.多变量分析中多重共线性影响的研究
Int J Appl Sci Technol. 2014 Oct;4(5):9-19.
5
Stability selection for lasso, ridge and elastic net implemented with AFT models.使用加速失效时间(AFT)模型实现套索、岭回归和弹性网络的稳定性选择。
Stat Appl Genet Mol Biol. 2019 Oct 7;18(5):/j/sagmb.2019.18.issue-5/sagmb-2017-0001/sagmb-2017-0001.xml. doi: 10.1515/sagmb-2017-0001.
6
Collinearity in ecological niche modeling: Confusions and challenges.生态位建模中的共线性:困惑与挑战。
Ecol Evol. 2019 Aug 20;9(18):10365-10376. doi: 10.1002/ece3.5555. eCollection 2019 Sep.
7
Understanding the Consequences of Collinearity for Multilevel Models: The Importance of Disaggregation Across Levels.理解共线性对多层次模型的影响:跨层次分解的重要性。
Multivariate Behav Res. 2024 Jul-Aug;59(4):693-715. doi: 10.1080/00273171.2024.2315549. Epub 2024 May 9.
8
Regression with Highly Correlated Predictors: Variable Omission Is Not the Solution.高度相关预测因子的回归:变量剔除并非解决之道。
Int J Environ Res Public Health. 2021 Apr 17;18(8):4259. doi: 10.3390/ijerph18084259.
9
Regression shrinkage methods for clinical prediction models do not guarantee improved performance: Simulation study.回归收缩方法在临床预测模型中并不能保证性能得到改善:模拟研究。
Stat Methods Med Res. 2020 Nov;29(11):3166-3178. doi: 10.1177/0962280220921415. Epub 2020 May 13.
10
Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。
BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.

引用本文的文献

1
Prescriptive Predictors of Mindfulness Ecological Momentary Intervention for Social Anxiety Disorder: Machine Learning Analysis of Randomized Controlled Trial Data.社交焦虑障碍正念生态瞬时干预的规范性预测因素:随机对照试验数据的机器学习分析
JMIR Ment Health. 2025 May 13;12:e67210. doi: 10.2196/67210.
2
A Bayesian Joint Model of Multiple Nonlinear Longitudinal and Competing Risks Outcomes for Dynamic Prediction in Multiple Myeloma: Joint Estimation and Corrected Two-Stage Approaches.用于多发性骨髓瘤动态预测的多个非线性纵向和竞争风险结果的贝叶斯联合模型:联合估计和校正两阶段方法
Stat Med. 2025 Feb 10;44(3-4):e10322. doi: 10.1002/sim.10322.
3

本文引用的文献

1
Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small.惩罚和收缩方法会产生不可靠的临床预测模型,尤其是在样本量较小时。
J Clin Epidemiol. 2021 Apr;132:88-96. doi: 10.1016/j.jclinepi.2020.12.005. Epub 2020 Dec 8.
2
Key challenges in normal tissue complication probability model development and validation: towards a comprehensive strategy.正常组织并发症概率模型建立和验证的关键挑战:走向全面策略。
Radiother Oncol. 2020 Jul;148:151-156. doi: 10.1016/j.radonc.2020.04.012. Epub 2020 Apr 23.
3
Calculating the sample size required for developing a clinical prediction model.
Impact of age of onset on the course of chronic schizophrenia: factors associated with first hospitalization in a large-scale cross-sectional study.
起病年龄对慢性精神分裂症病程的影响:一项大规模横断面研究中与首次住院相关的因素
Eur Arch Psychiatry Clin Neurosci. 2025 Jan 21. doi: 10.1007/s00406-025-01959-4.
4
Intensive longitudinal assessment following index trauma to predict development of PTSD using machine learning.创伤后使用机器学习进行密集纵向评估以预测 PTSD 的发展。
J Anxiety Disord. 2024 Jun;104:102876. doi: 10.1016/j.janxdis.2024.102876. Epub 2024 May 5.
5
Which client with generalized anxiety disorder benefits from a mindfulness ecological momentary intervention versus a self-monitoring app? Developing a multivariable machine learning predictive model.哪种广泛性焦虑障碍患者从正念生态瞬间干预中获益,而不是自我监测应用程序?开发多变量机器学习预测模型。
J Anxiety Disord. 2024 Mar;102:102825. doi: 10.1016/j.janxdis.2024.102825. Epub 2024 Jan 5.
6
Comparing supervised and semi-supervised machine learning approaches in NTCP modeling to predict complications in head and neck cancer patients.比较监督式和半监督式机器学习方法在正常组织并发症概率(NTCP)建模中预测头颈癌患者并发症的情况。
Clin Transl Radiat Oncol. 2023 Sep 21;43:100677. doi: 10.1016/j.ctro.2023.100677. eCollection 2023 Nov.
7
Development of a model to predict antidepressant treatment response for depression among Veterans.开发一种预测退伍军人抑郁症抗抑郁治疗反应的模型。
Psychol Med. 2023 Aug;53(11):5001-5011. doi: 10.1017/S0033291722001982. Epub 2022 Jul 15.
8
Derivation and validation of a clinical prediction model for risk-stratification of children hospitalized with severe pneumonia in Bangladesh.孟加拉国重症肺炎住院儿童风险分层临床预测模型的推导与验证
PLOS Glob Public Health. 2023 Aug 1;3(8):e0002216. doi: 10.1371/journal.pgph.0002216. eCollection 2023.
9
Stability of clinical prediction models developed using statistical or machine learning methods.基于统计或机器学习方法开发的临床预测模型的稳定性。
Biom J. 2023 Dec;65(8):e2200302. doi: 10.1002/bimj.202200302. Epub 2023 Jul 19.
10
Development of a model to predict combined antidepressant medication and psychotherapy treatment response for depression among veterans.开发一种预测退伍军人抑郁症联合抗抑郁药物和心理治疗反应的模型。
J Affect Disord. 2023 Apr 1;326:111-119. doi: 10.1016/j.jad.2023.01.082. Epub 2023 Jan 26.
计算开发临床预测模型所需的样本量。
BMJ. 2020 Mar 18;368:m441. doi: 10.1136/bmj.m441.
4
Using simulation studies to evaluate statistical methods.运用模拟研究评估统计方法。
Stat Med. 2019 May 20;38(11):2074-2102. doi: 10.1002/sim.8086. Epub 2019 Jan 16.
5
Prediction equations of forced oscillation technique: the insidious role of collinearity.强迫振荡技术预测方程:共线性的潜在作用。
Respir Res. 2018 Mar 27;19(1):48. doi: 10.1186/s12931-018-0745-8.
6
Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification.共线性与因果图:关于模型设定重要性的一课
Epidemiology. 2017 Jan;28(1):47-53. doi: 10.1097/EDE.0000000000000554.
7
Multivariable normal tissue complication probability model-based treatment plan optimization for grade 2-4 dysphagia and tube feeding dependence in head and neck radiotherapy.基于多变量正常组织并发症概率模型对头颈部放疗中2-4级吞咽困难和鼻饲依赖进行治疗计划优化
Radiother Oncol. 2016 Dec;121(3):374-380. doi: 10.1016/j.radonc.2016.08.016. Epub 2016 Sep 7.
8
A calibration hierarchy for risk models was defined: from utopia to empirical data.定义了风险模型的校准层次结构:从理想状态到经验数据。
J Clin Epidemiol. 2016 Jun;74:167-76. doi: 10.1016/j.jclinepi.2015.12.005. Epub 2016 Jan 6.
9
Swallowing sparing intensity modulated radiotherapy (SW-IMRT) in head and neck cancer: Clinical validation according to the model-based approach.头颈部癌的吞咽保留调强放疗(SW-IMRT):基于模型方法的临床验证
Radiother Oncol. 2016 Feb;118(2):298-303. doi: 10.1016/j.radonc.2015.11.009. Epub 2015 Dec 14.
10
Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events.低事件数低维数据中风险预测的惩罚回归方法综述与评估
Stat Med. 2016 Mar 30;35(7):1159-77. doi: 10.1002/sim.6782. Epub 2015 Oct 29.