Mazumder Rahul, Hastie Trevor, Tibshirani Robert
Department of Statistics, Stanford University.
J Mach Learn Res. 2010 Mar 1;11:2287-2322.
We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.
我们使用凸松弛技术为大规模矩阵补全问题提供一系列正则化的低秩解。以核范数作为正则化项,我们提供了一种简单且高效的凸算法,用于在核范数有界的情况下最小化重构误差。我们的算法Soft-Impute迭代地用从软阈值奇异值分解(SVD)得到的元素替换缺失元素。通过热启动,这使我们能够在正则化参数值的网格上高效地计算整个解的正则化路径。我们算法中计算密集的部分是计算一个稠密矩阵的低秩奇异值分解。利用问题结构,我们表明该任务可以以与矩阵维度成线性关系的复杂度来执行。我们的半定规划算法很容易扩展到大型矩阵:例如,它可以在2.5小时内得到一个10(6)×10(6)的不完整矩阵(有10(5)个观测值)的秩为80的近似,并且可以在6.6小时内为完整的Netflix训练集拟合一个秩为40的近似。与其他具有竞争力的先进技术相比,我们的方法在训练误差和测试误差方面都表现出非常好的性能。