一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

作者信息

Witten Daniela M, Tibshirani Robert, Hastie Trevor

机构信息

Department of Statistics, Stanford University, Stanford, CA 94305, USA.

出版信息

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

DOI:10.1093/biostatistics/kxp008

PMID:19377034

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2697346/

Abstract

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

摘要

我们提出了一种惩罚矩阵分解（PMD）方法，这是一种用于计算矩阵的秩-K近似值的新框架。我们将矩阵X近似为(\hat{X} = \sum_{k = 1}^{K} d_{(k)} u_{(k)} v_{(k)}^T)，其中(d_{(k)})、(u_{(k)})和(v_{(k)})使(X - \hat{X})的Frobenius范数平方最小化，同时对(u_{(k)})和(v_{(k)})施加惩罚。这就产生了奇异值分解的正则化版本。特别值得关注的是对(u_{(k)})和(v_{(k)})使用(L_1)惩罚，这会使用稀疏向量对X进行分解。我们表明，当对(v_{(k)})而不是(u_{(k)})应用带有(L_1)惩罚的PMD时，会得到一种稀疏主成分分析方法。实际上，这为获取稀疏主成分的“SCoTLASS”提议（Jolliffe等人，2003年）产生了一种高效算法。该方法在一个公开可用的基因表达数据集上得到了验证。我们还建立了用于稀疏主成分分析的SCoTLASS方法与Zou等人（2006年）方法之间的联系。此外，我们表明，当将PMD应用于交叉乘积矩阵时，它会产生一种惩罚典型相关分析（CCA）方法。我们将这种惩罚CCA方法应用于模拟数据以及一个基因组数据集，该基因组数据集包含同一组样本上的基因表达和DNA拷贝数测量值。

相似文献

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

Sparse canonical correlation analysis from a predictive point of view.

Biom J. 2015 Sep;57(5):834-51. doi: 10.1002/bimj.201400226. Epub 2015 Jul 6.

Extensions of sparse canonical correlation analysis with applications to genomic data.

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

An iterative penalized least squares approach to sparse canonical correlation analysis.

Biometrics. 2019 Sep;75(3):734-744. doi: 10.1111/biom.13043. Epub 2019 Apr 9.

Robust Principal Component Analysis Regularized by Truncated Nuclear Norm for Identifying Differentially Expressed Genes.

IEEE Trans Nanobioscience. 2017 Sep;16(6):447-454. doi: 10.1109/TNB.2017.2723439. Epub 2017 Jul 4.

Integrative factorization of bidimensionally linked matrices.

Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.

Group sparse canonical correlation analysis for genomic data integration.

BMC Bioinformatics. 2013 Aug 12;14:245. doi: 10.1186/1471-2105-14-245.

Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.

BMC Bioinformatics. 2010 Apr 15;11:191. doi: 10.1186/1471-2105-11-191.

Robust sparse canonical correlation analysis.

BMC Syst Biol. 2016 Aug 11;10(1):72. doi: 10.1186/s12918-016-0317-9.

Biclustering via sparse singular value decomposition.

Biometrics. 2010 Dec;66(4):1087-95. doi: 10.1111/j.1541-0420.2010.01392.x.

引用本文的文献

Effects of Asprosin and Role of TLR4 as a Biomarker in Endometrial Cancer.

Molecules. 2025 Aug 18;30(16):3410. doi: 10.3390/molecules30163410.

A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf355.

Gene-Based Burden Testing of Rare Variants in Hemiplegic Migraine: A Computational Approach to Uncover the Genetic Architecture of a Rare Brain Disorder.

Genes (Basel). 2025 Jul 9;16(7):807. doi: 10.3390/genes16070807.

A statistical view of column subset selection.

J R Stat Soc Series B Stat Methodol. 2025 May 16. doi: 10.1093/jrsssb/qkaf023.

Integrating multimodal cancer data using deep latent variable path modelling.

Nat Mach Intell. 2025;7(7):1053-1075. doi: 10.1038/s42256-025-01052-4. Epub 2025 Jul 22.

A systematic benchmark of integrative strategies for microbiome-metabolome data.

Commun Biol. 2025 Jul 25;8(1):1100. doi: 10.1038/s42003-025-08515-9.

Tensor decomposition of multi-dimensional splicing events across multiple tissues to identify splicing-mediated risk genes associated with complex traits.

PLoS Comput Biol. 2025 Jul 21;21(7):e1013303. doi: 10.1371/journal.pcbi.1013303. eCollection 2025 Jul.

Social aloofness is associated with non-social explore-exploit decisions.

Commun Psychol. 2025 Jul 15;3(1):106. doi: 10.1038/s44271-025-00278-7.

Fate of antibiotic resistance genes and resistant bacteria under various operating temperatures of sludge anaerobic digestion.

Water Sci Technol. 2025 Jul;92(1):53-65. doi: 10.2166/wst.2025.093. Epub 2025 Jun 30.

Gene set optimization for cancer transcriptomics using sparse principal component analysis.

bioRxiv. 2025 May 26:2025.05.21.655279. doi: 10.1101/2025.05.21.655279.

本文引用的文献

Sparse canonical correlation analysis with application to genomic data integration.

Stat Appl Genet Mol Biol. 2009;8:Article 1. doi: 10.2202/1544-6115.1406. Epub 2009 Jan 6.

Genome-wide sparse canonical correlation of gene expression with genotypes.

BMC Proc. 2007;1 Suppl 1(Suppl 1):S119. doi: 10.1186/1753-6561-1-s1-s119. Epub 2007 Dec 18.

Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis.

Stat Appl Genet Mol Biol. 2008;7(1):Article3. doi: 10.2202/1544-6115.1329. Epub 2008 Jan 23.

Spatial smoothing and hot spot detection for CGH data using the fused lasso.

Biostatistics. 2008 Jan;9(1):18-29. doi: 10.1093/biostatistics/kxm013. Epub 2007 May 18.

Relative impact of nucleotide and copy number variation on gene expression phenotypes.

Science. 2007 Feb 9;315(5813):848-53. doi: 10.1126/science.1136678.

Genomic and transcriptional aberrations linked to breast cancer pathophysiologies.

Cancer Cell. 2006 Dec;10(6):529-41. doi: 10.1016/j.ccr.2006.10.009.

Genome-wide associations of gene expression variation in humans.

PLoS Genet. 2005 Dec;1(6):e78. doi: 10.1371/journal.pgen.0010078. Epub 2005 Dec 16.

Genetic analysis of genome-wide variation in human gene expression.

Nature. 2004 Aug 12;430(7001):743-7. doi: 10.1038/nature02797. Epub 2004 Jul 21.

Impact of DNA amplification on gene expression patterns in breast cancer.

Cancer Res. 2002 Nov 1;62(21):6240-5.

Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors.

Proc Natl Acad Sci U S A. 2002 Oct 1;99(20):12963-8. doi: 10.1073/pnas.162471999. Epub 2002 Sep 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

作者信息

Witten Daniela M, Tibshirani Robert, Hastie Trevor

机构信息

Department of Statistics, Stanford University, Stanford, CA 94305, USA.

出版信息

Biostatistics. 2009 Jul;10(3):515-34. doi: 10.1093/biostatistics/kxp008. Epub 2009 Apr 17.

DOI:10.1093/biostatistics/kxp008

PMID:19377034

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2697346/

Abstract

摘要

一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种惩罚矩阵分解及其在稀疏主成分分析和典型相关分析中的应用。

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

作者信息

机构信息

出版信息