Suppr超能文献

关于推导多变量回归模型中的稳定性问题。

On stability issues in deriving multivariable regression models.

作者信息

Sauerbrei Willi, Buchholz Anika, Boulesteix Anne-Laure, Binder Harald

机构信息

Department für Medizinische Biometrie und Medizinische Informatik, Universitätsklinikum Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg, Germany.

Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Ludwig-Maximilians-Universität München, Marchioninistr. 15, 81377, München, Germany.

出版信息

Biom J. 2015 Jul;57(4):531-55. doi: 10.1002/bimj.201300222. Epub 2014 Dec 15.

Abstract

In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process. For the latter two tasks we adapt and extend recent approaches, such as stability paths, to serve our purposes. Based on earlier experiences and on results from the example, we will argue for simpler models and that predictions are usually very similar, irrespective of the selection method used. Important differences exist for the corresponding variances, and the model uncertainty concept helps to protect against serious underestimation of the variance of a predictor-derived data dependently. Results of stability investigations illustrate severe difficulties in the task of deriving a suitable explanatory model. It seems possible to identify a small number of variables with an important and probably true influence on the outcome, but too often several variables are included whose selection may be a result of chance or may depend on a small number of observations.

摘要

在许多分析实证数据的科学领域,一项常见任务是识别对结果有影响的重要变量。通常,这是通过在多变量回归模型的背景下使用变量选择策略来完成的。通过一项关于儿童臭氧影响的研究(n = 496,24个协变量),我们将讨论与推导合适模型相关的方面。重点关注模型稳定性,我们将探索并说明预测模型与解释模型之间的差异、停止标准的关键作用以及自助重抽样(有放回和无放回)的价值。自助重抽样将用于评估变量选择的稳定性、推导纳入模型不确定性的预测器、检查有影响的点以及可视化变量选择过程。对于后两项任务,我们调整并扩展了最近的方法,如稳定性路径,以满足我们的目的。基于早期经验和示例结果,我们将主张采用更简单的模型,并且无论使用何种选择方法,预测结果通常非常相似。相应的方差存在重要差异,模型不确定性概念有助于防止因依赖预测器得出的数据而严重低估方差。稳定性调查结果表明,推导合适的解释模型任务存在严重困难。似乎有可能识别出少数对结果有重要且可能真实影响的变量,但通常会包含几个变量,其选择可能是偶然结果,或者可能取决于少数观测值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验