Suppr超能文献

分块岭回归:岭回归的一种快速可解释的重参数化方法。

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression.

机构信息

Department of Psychology and the eScience Institute, University of Washington, Guthrie Hall 119A, Seattle, WA, 98195, USA.

Center for Magnetic Resonance Research, University of Minnesota, Twin Cities, 2021 6th St SE, Minneapolis, MN, 55455, USA.

出版信息

Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa133.

Abstract

BACKGROUND

Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable.

RESULTS

The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets.

CONCLUSION

Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

摘要

背景

岭回归是一种正则化技术,对线性回归中的系数的 L2-范数进行惩罚。使用岭回归的一个挑战是需要设置一个超参数(α)来控制正则化的程度。交叉验证通常用于从一组候选者中选择最佳的α。然而,有效的和适当的α选择可能具有挑战性。当分析大量数据时,这变得很困难。由于选择的α取决于数据的规模和预测器之间的相关性,因此也不容易解释。

结果

本研究通过一种新颖的岭回归方法解决了这些挑战。我们建议根据正则化和非正则化系数的 L2-范数之间的比率 γ 重新参数化岭回归。我们提供了一种高效实现该方法的算法,称为分数岭回归,以及 Python 和 matlab 中的开源软件实现(https://github.com/nrdg/fracridge)。我们表明,所提出的方法对于大规模数据问题是快速和可扩展的。在脑成像数据中,我们证明了这种方法可以直接解释结果,并在模型和数据集之间进行比较。

结论

分数岭回归具有几个优点:对于不同的 γ 获得的解是保证变化的,防止浪费计算;并且自动涵盖了正则化的相关范围,避免了艰苦的手动探索的需要。这些特性使得分数岭回归特别适合于分析大型复杂数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09f2/7702219/fc7e7b6e9b59/giaa133fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验