用于学习大型不完整矩阵的谱正则化算法

Spectral Regularization Algorithms for Learning Large Incomplete Matrices.

作者信息

Mazumder Rahul, Hastie Trevor, Tibshirani Robert

机构信息

Department of Statistics, Stanford University.

出版信息

J Mach Learn Res. 2010 Mar 1;11:2287-2322.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3087301/

Abstract

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.

摘要

我们使用凸松弛技术为大规模矩阵补全问题提供一系列正则化的低秩解。以核范数作为正则化项，我们提供了一种简单且高效的凸算法，用于在核范数有界的情况下最小化重构误差。我们的算法Soft-Impute迭代地用从软阈值奇异值分解（SVD）得到的元素替换缺失元素。通过热启动，这使我们能够在正则化参数值的网格上高效地计算整个解的正则化路径。我们算法中计算密集的部分是计算一个稠密矩阵的低秩奇异值分解。利用问题结构，我们表明该任务可以以与矩阵维度成线性关系的复杂度来执行。我们的半定规划算法很容易扩展到大型矩阵：例如，它可以在2.5小时内得到一个10(6)×10(6)的不完整矩阵（有10(5)个观测值）的秩为80的近似，并且可以在6.6小时内为完整的Netflix训练集拟合一个秩为40的近似。与其他具有竞争力的先进技术相比，我们的方法在训练误差和测试误差方面都表现出非常好的性能。

相似文献

1

Spectral Regularization Algorithms for Learning Large Incomplete Matrices.

J Mach Learn Res. 2010 Mar 1;11:2287-2322.

2

Logarithmic Norm Regularized Low-Rank Factorization for Matrix and Tensor Completion.

IEEE Trans Image Process. 2021;30:3434-3449. doi: 10.1109/TIP.2021.3061908. Epub 2021 Mar 9.

3

Generalized Unitarily Invariant Gauge Regularization for Fast Low-Rank Matrix Recovery.

IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1627-1641. doi: 10.1109/TNNLS.2020.2985850. Epub 2021 Apr 2.

4

Rank-One Matrix Completion With Automatic Rank Estimation via L1-Norm Regularization.

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4744-4757. doi: 10.1109/TNNLS.2017.2766160. Epub 2017 Dec 11.

5

Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.

J Mach Learn Res. 2015;16:3367-3402.

6

Fast and accurate matrix completion via truncated nuclear norm regularization.

IEEE Trans Pattern Anal Mach Intell. 2013 Sep;35(9):2117-30. doi: 10.1109/TPAMI.2012.271.

7

Trace Norm Regularized CANDECOMP/PARAFAC Decomposition With Missing Data.

IEEE Trans Cybern. 2015 Nov;45(11):2437-48. doi: 10.1109/TCYB.2014.2374695.

8

Sparse subspace clustering for data with missing entries and high-rank matrix completion.

Neural Netw. 2017 Sep;93:36-44. doi: 10.1016/j.neunet.2017.04.005. Epub 2017 Apr 25.

9

Matrix Completion Based on Non-convex Low Rank Approximation.

IEEE Trans Image Process. 2018 Dec 14. doi: 10.1109/TIP.2018.2886712.

10

Computationally Efficient Truncated Nuclear Norm Minimization for High Dynamic Range Imaging.

IEEE Trans Image Process. 2016 Sep;25(9):4145-57. doi: 10.1109/TIP.2016.2585047. Epub 2016 Jun 24.

引用本文的文献

1

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models.

J Am Stat Assoc. 2024 Sep 20. doi: 10.1080/01621459.2024.2380105.

2

Haplotype-based insights into seminal root angle in barley.

Plant Genome. 2025 Sep;18(3):e70088. doi: 10.1002/tpg2.70088.

3

Genome-wide association study of cocaine self-administration behavior in Heterogeneous Stock rats.

bioRxiv. 2025 Jul 28:2025.07.17.660811. doi: 10.1101/2025.07.17.660811.

4

Comprehensive analysis of mutational processes across 20 000 adult and pediatric tumors.

Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf648.

5

MLOmics: Cancer Multi-Omics Database for Machine Learning.

Sci Data. 2025 May 30;12(1):913. doi: 10.1038/s41597-025-05235-x.

6

HORNET: tools to find genes with causal evidence and their regulatory networks using eQTLs.

Bioinform Adv. 2025 Apr 18;5(1):vbaf068. doi: 10.1093/bioadv/vbaf068. eCollection 2025.

7

Collective Microbial Effects Drive Toxin Bioremediation and Enable Rational Design.

bioRxiv. 2025 Mar 28:2025.03.28.645802. doi: 10.1101/2025.03.28.645802.

8

omicsGMF: a multi-tool for dimensionality reduction, batch correction and imputation applied to bulk- and single cell proteomics data.

bioRxiv. 2025 Mar 28:2025.03.24.644996. doi: 10.1101/2025.03.24.644996.

9

Time-varying mediation analysis for incomplete data with application to DNA methylation study for PTSD.

bioRxiv. 2025 Mar 12:2024.02.06.579228. doi: 10.1101/2024.02.06.579228.

10

PROLONG: penalized regression for outcome guided longitudinal omics analysis with network and group constraints.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf099.

本文引用的文献

1

Missing value estimation methods for DNA microarrays.

Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。