Radhakrishnan Adityanarayanan, Belkin Mikhail, Drusvyatskiy Dmitriy
Applied Math, Harvard University, MA 02138.
Broad Institute of Massachusetts Institute of Technology and Harvard, MA 02138.
Proc Natl Acad Sci U S A. 2025 Apr;122(13):e2411325122. doi: 10.1073/pnas.2411325122. Epub 2025 Mar 28.
A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction-a process called feature learning. Recent work [A. Radhakrishnan, D. Beaglehole, P. Pandit, M. Belkin, , 1461-1467 (2024).] posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between 1) reweighting the feature vectors by the AGOP and 2) learning the prediction function in the transformed space. In this work, we develop theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparameterized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) reduces to a variant of the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Furthermore, our results connect feature learning in neural networks and classical sparse recovery algorithms and shed light on how neural networks recover low rank structure from data. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithms since it avoids forming singular value decompositions. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.
机器学习中的一个基本问题是理解神经网络如何做出准确预测,同时似乎避开了维度灾难。一种可能的解释是,神经网络的常见训练算法会隐式地执行降维——这一过程称为特征学习。最近的工作[A. 拉德哈克里什南、D. 比格霍尔、P. 潘迪特、M. 贝尔金, ,1461 - 1467 (2024)]认为,特征学习的效果可以从一种称为平均梯度外积(AGOP)的经典统计估计器中得出。作者提出了递归特征机器(RFM)作为一种算法,该算法通过在1)用AGOP对特征向量重新加权和2)在变换后的空间中学习预测函数之间交替,来显式地执行特征学习。在这项工作中,我们通过关注稀疏线性回归和低秩矩阵恢复中出现的过参数化问题类别,为RFM如何执行降维开发了理论保证。具体来说,我们表明限制在线性模型上的RFM(lin - RFM)简化为经过充分研究的迭代加权最小二乘法(IRLS)算法变体。此外,我们的结果将神经网络中的特征学习与经典稀疏恢复算法联系起来,并阐明了神经网络如何从数据中恢复低秩结构。此外,我们提供了lin - RFM的一种实现,它可以扩展到具有数百万个缺失条目的矩阵。我们的实现比标准的IRLS算法更快,因为它避免了形成奇异值分解。它在稀疏线性回归和低秩矩阵补全方面也优于深度线性网络。