Suppr超能文献

线性回归模型中的变量选择:选择最佳子集并不总是最佳选择。

Variable selection in linear regression models: Choosing the best subset is not always the best choice.

机构信息

Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.

Department of Mathematics and Computer Science, University of Bremen, Bremen, Germany.

出版信息

Biom J. 2024 Jan;66(1):e2200209. doi: 10.1002/bimj.202200209. Epub 2023 Aug 29.

Abstract

We consider the question of variable selection in linear regressions, in the sense of identifying the correct direct predictors (those variables that have nonzero coefficients given all candidate predictors). Best subset selection (BSS) is often considered the "gold standard," with its use being restricted only by its NP-hard nature. Alternatives such as the least absolute shrinkage and selection operator (Lasso) or the Elastic net (Enet) have become methods of choice in high-dimensional settings. A recent proposal represents BSS as a mixed-integer optimization problem so that large problems have become computationally feasible. We present an extensive neutral comparison assessing the ability to select the correct direct predictors of BSS compared to forward stepwise selection (FSS), Lasso, and Enet. The simulation considers a range of settings that are challenging regarding dimensionality (number of observations and variables), signal-to-noise ratios, and correlations between predictors. As fair measure of performance, we primarily used the best possible F1-score for each method, and results were confirmed by alternative performance measures and practical criteria for choosing the tuning parameters and subset sizes. Surprisingly, it was only in settings where the signal-to-noise ratio was high and the variables were uncorrelated that BSS reliably outperformed the other methods, even in low-dimensional settings. Furthermore, FSS performed almost identically to BSS. Our results shed new light on the usual presumption of BSS being, in principle, the best choice for selecting the correct direct predictors. Especially for correlated variables, alternatives like Enet are faster and appear to perform better in practical settings.

摘要

我们考虑了线性回归中的变量选择问题,即确定正确的直接预测因子(给定所有候选预测因子,这些变量的系数不为零)。最佳子集选择(BSS)通常被认为是“黄金标准”,其使用仅受到其 NP 难性质的限制。像最小绝对值收缩和选择算子(Lasso)或弹性网络(Enet)这样的替代方法在高维环境中已成为首选方法。最近的一项提案将 BSS 表示为混合整数优化问题,因此大型问题在计算上变得可行。我们进行了广泛的中立比较,评估了与前向逐步选择(FSS)、Lasso 和 Enet 相比,BSS 选择正确直接预测因子的能力。该模拟考虑了一系列具有挑战性的设置,涉及维度(观测值和变量的数量)、信噪比以及预测因子之间的相关性。作为性能的公平衡量标准,我们主要使用每种方法的最佳可能 F1 分数,并且通过替代性能衡量标准和选择调整参数和子集大小的实用标准来确认结果。令人惊讶的是,只有在信噪比高且变量不相关的情况下,BSS 才能可靠地优于其他方法,即使在低维设置中也是如此。此外,FSS 的性能几乎与 BSS 相同。我们的结果为 BSS 通常被认为是选择正确直接预测因子的最佳选择这一常见假设提供了新的视角。特别是对于相关变量,Enet 等替代方法速度更快,在实际设置中似乎表现更好。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验