Fan Jianqing, Yang Zhuoran, Yu Mengxin
Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University.
Ph.D. students at Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA.
J Am Stat Assoc. 2023;118(544):2315-2328. doi: 10.1080/01621459.2022.2044824. Epub 2022 Mar 27.
In this paper, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of the role played by implicit regularization without excess technicality, we assume that the distribution of the covariates is known a priori. For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data. We propose to estimate the true parameter by applying regularization-free gradient descent to the loss function. When the initialization is close to the origin and the stepsize is sufficiently small, we prove that the obtained solution achieves minimax optimal statistical rates of convergence in both the vector and matrix cases. In addition, our experimental results support our theoretical findings and also demonstrate that our methods empirically outperform classical methods with explicit regularization in terms of both -statistical rate and variable selection consistency.
在本文中,我们利用过参数化来设计用于高维单指标模型的无正则化算法,并为诱导的隐式正则化现象提供理论保证。具体而言,我们研究向量和矩阵单指标模型,其中链接函数是非线性且未知的,信号参数要么是稀疏向量,要么是低秩对称矩阵,并且响应变量可能是重尾的。为了在不过度讲究技术细节的情况下更好地理解隐式正则化所起的作用,我们假设协变量的分布是先验已知的。对于向量和矩阵设置,我们通过采用得分函数变换和专门为重尾数据设计的稳健截断步骤来构造一个过参数化的最小二乘损失函数。我们建议通过对损失函数应用无正则化梯度下降来估计真实参数。当初始化接近原点且步长足够小时,我们证明在向量和矩阵情况下,所得到的解都能达到极小极大最优统计收敛速率。此外,我们的实验结果支持了我们的理论发现,并且还表明我们的方法在统计速率和变量选择一致性方面在经验上优于具有显式正则化的经典方法。