分块岭回归：岭回归的一种快速可解释的重参数化方法。

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression.

机构信息

Department of Psychology and the eScience Institute, University of Washington, Guthrie Hall 119A, Seattle, WA, 98195, USA.

Center for Magnetic Resonance Research, University of Minnesota, Twin Cities, 2021 6th St SE, Minneapolis, MN, 55455, USA.

出版信息

Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa133.

DOI:10.1093/gigascience/giaa133

PMID:33252656

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7702219/

Abstract

BACKGROUND

Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable.

RESULTS

The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets.

CONCLUSION

Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

摘要

背景

岭回归是一种正则化技术，对线性回归中的系数的 L2-范数进行惩罚。使用岭回归的一个挑战是需要设置一个超参数（α）来控制正则化的程度。交叉验证通常用于从一组候选者中选择最佳的α。然而，有效的和适当的α选择可能具有挑战性。当分析大量数据时，这变得很困难。由于选择的α取决于数据的规模和预测器之间的相关性，因此也不容易解释。

结果

本研究通过一种新颖的岭回归方法解决了这些挑战。我们建议根据正则化和非正则化系数的 L2-范数之间的比率 γ 重新参数化岭回归。我们提供了一种高效实现该方法的算法，称为分数岭回归，以及 Python 和 matlab 中的开源软件实现（https://github.com/nrdg/fracridge）。我们表明，所提出的方法对于大规模数据问题是快速和可扩展的。在脑成像数据中，我们证明了这种方法可以直接解释结果，并在模型和数据集之间进行比较。

结论

分数岭回归具有几个优点：对于不同的 γ 获得的解是保证变化的，防止浪费计算；并且自动涵盖了正则化的相关范围，避免了艰苦的手动探索的需要。这些特性使得分数岭回归特别适合于分析大型复杂数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09f2/7702219/fc7e7b6e9b59/giaa133fig1.jpg

相似文献

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression.

Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa133.

Feature-space selection with banded ridge regression.

Neuroimage. 2022 Dec 1;264:119728. doi: 10.1016/j.neuroimage.2022.119728. Epub 2022 Nov 8.

Nonlinear ridge regression improves cell-type-specific differential expression analysis.

BMC Bioinformatics. 2021 Mar 22;22(1):141. doi: 10.1186/s12859-021-03982-3.

eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models.

BMC Bioinformatics. 2019 Apr 16;20(1):189. doi: 10.1186/s12859-019-2778-5.

Minipatch Learning as Implicit Ridge-Like Regularization.

Int Conf Big Data Smart Comput. 2021 Jan;2021. doi: 10.1109/bigcomp51126.2021.00021. Epub 2021 Mar 10.

sparsesurv: a Python package for fitting sparse survival models via knowledge distillation.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae521.

L2-norm multiple kernel learning and its application to biomedical data fusion.

BMC Bioinformatics. 2010 Jun 8;11:309. doi: 10.1186/1471-2105-11-309.

Fast Estimation of L1-Regularized Linear Models in the Mass-Univariate Setting.

Neuroinformatics. 2021 Jul;19(3):385-392. doi: 10.1007/s12021-020-09489-1.

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data.

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa026.

Fractional norm regularization: learning with very few relevant features.

IEEE Trans Neural Netw Learn Syst. 2013 Jun;24(6):953-63. doi: 10.1109/TNNLS.2013.2247417.

引用本文的文献

Searchlight-based trial-wise fMRI decoding in the presence of trial-by-trial correlations.

Imaging Neurosci (Camb). 2025 Sep 2;3. doi: 10.1162/IMAG.a.131. eCollection 2025.

High-level visual representations in the human brain are aligned with large language models.

Nat Mach Intell. 2025;7(8):1220-1234. doi: 10.1038/s42256-025-01072-0. Epub 2025 Aug 7.

Post-Saccadic Disruption of Semantic Category Information in Naturalistic Scenes.

bioRxiv. 2025 Jun 10:2025.06.06.658316. doi: 10.1101/2025.06.06.658316.

Comparison of whole-brain task-modulated functional connectivity methods for fMRI task connectomics.

bioRxiv. 2024 Oct 14:2024.01.22.576622. doi: 10.1101/2024.01.22.576622.

Comparison of whole-brain task-modulated functional connectivity methods for fMRI task connectomics.

Commun Biol. 2024 Oct 26;7(1):1402. doi: 10.1038/s42003-024-07088-3.

Distributed representations of behaviour-derived object dimensions in the human visual system.

Nat Hum Behav. 2024 Nov;8(11):2179-2193. doi: 10.1038/s41562-024-01980-y. Epub 2024 Sep 9.

Smart Water Quality Monitoring with IoT Wireless Sensor Networks.

Sensors (Basel). 2024 Apr 30;24(9):2871. doi: 10.3390/s24092871.

Decoding face recognition abilities in the human brain.

PNAS Nexus. 2024 Mar 1;3(3):pgae095. doi: 10.1093/pnasnexus/pgae095. eCollection 2024 Mar.

Driving and suppressing the human language network using large language models.

Nat Hum Behav. 2024 Mar;8(3):544-561. doi: 10.1038/s41562-023-01783-7. Epub 2024 Jan 3.

Human brain responses are modulated when exposed to optimized natural images or synthetically generated images.

Commun Biol. 2023 Oct 23;6(1):1076. doi: 10.1038/s42003-023-05440-7.

本文引用的文献

Array programming with NumPy.

Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.

SciPy 1.0: fundamental algorithms for scientific computing in Python.

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

Definitions, methods, and applications in interpretable machine learning.

Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22071-22080. doi: 10.1073/pnas.1900654116. Epub 2019 Oct 16.

Computational neuroimaging and population receptive fields.

Trends Cogn Sci. 2015 Jun;19(6):349-57. doi: 10.1016/j.tics.2015.03.009. Epub 2015 Apr 4.

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

Efficient quadratic regularization for expression arrays.

Biostatistics. 2004 Jul;5(3):329-40. doi: 10.1093/biostatistics/5.3.329.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

分块岭回归：岭回归的一种快速可解释的重参数化方法。

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献