结构化主成分分析的基本限制以及如何达到这些限制。

Fundamental limits in structured principal component analysis and how to reach them.

作者信息

Barbier Jean, Camilli Francesco, Mondelli Marco, Sáenz Manuel

机构信息

Quantitative Life Sciences and Mathematics Sections, International Centre for Theoretical Physics, Trieste 34151, Italy.

Institute of Science and Technology Austria, Klosterneuburg 3400, Austria.

出版信息

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2302028120. doi: 10.1073/pnas.2302028120. Epub 2023 Jul 18.

DOI:10.1073/pnas.2302028120

PMID:37463204

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10374165/

Abstract

How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless-Anderson-Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.

摘要

测量噪声中的统计相关性如何影响高维推理？为了回答这个问题，我们研究了主成分分析（PCA）的典型尖峰矩阵模型，其中一个秩一矩阵被加性噪声破坏。我们超越了对噪声项通常的独立性假设，通过从低阶多项式正交矩阵系综中抽取噪声。由此产生的噪声相关性使得该设置与应用相关，但在分析上具有挑战性。我们给出了该模型中贝叶斯最优推理极限的特征描述。如果尖峰是旋转不变的，我们表明标准谱PCA是最优的。然而，对于更一般的先验，PCA和现有的近似消息传递算法（AMP）都未能达到信息论极限，我们使用统计物理学中的副本方法来计算该极限。因此，我们受自适应 Thouless-Anderson-Palmer 方程理论的启发提出了一种AMP，通过实验观察发现它能达到推测的理论极限。这种AMP伴随着严格的状态演化分析来跟踪其性能。尽管我们专注于特定的噪声分布，但我们的方法可以推广到更广泛的迹矩阵系综类别，代价是表达式会更复杂。最后，尽管看似有旋转不变噪声这个很强的假设，但我们的理论通过实验预测了在真实数据上的算法性能，表明存在很强的普遍性性质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dac0/10374165/0ed0920cbb37/pnas.2302028120fig01.jpg

相似文献

Fundamental limits in structured principal component analysis and how to reach them.结构化主成分分析的基本限制以及如何达到这些限制。

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2302028120. doi: 10.1073/pnas.2302028120. Epub 2023 Jul 18.

Optimal errors and phase transitions in high-dimensional generalized linear models.高维广义线性模型中的最优误差与相变

Proc Natl Acad Sci U S A. 2019 Mar 19;116(12):5451-5460. doi: 10.1073/pnas.1802705116. Epub 2019 Mar 1.

Phase diagram of matrix compressed sensing.矩阵压缩感知的相图

Phys Rev E. 2016 Dec;94(6-1):062136. doi: 10.1103/PhysRevE.94.062136. Epub 2016 Dec 27.

Singular vectors of sums of rectangular random matrices and optimal estimation of high-rank signals: The extensive spike model.矩形随机矩阵之和的奇异向量与高秩信号的最优估计：广义尖峰模型

Phys Rev E. 2023 Nov;108(5-1):054129. doi: 10.1103/PhysRevE.108.054129.

Statistical limits of dictionary learning: Random matrix theory and the spectral replica method.字典学习的统计极限：随机矩阵理论与谱复制方法。

Phys Rev E. 2022 Aug;106(2-1):024136. doi: 10.1103/PhysRevE.106.024136.

No Statistical-Computational Gap in Spiked Matrix Models with Generative Network Priors.具有生成网络先验的尖峰矩阵模型中不存在统计计算差距。

Entropy (Basel). 2021 Jan 16;23(1):115. doi: 10.3390/e23010115.

Approximate message passing from random initialization with applications to synchronization.从随机初始化出发的近似消息传递及其在同步中的应用

Proc Natl Acad Sci U S A. 2023 Aug;120(31):e2302930120. doi: 10.1073/pnas.2302930120. Epub 2023 Jul 25.

Memory-free dynamics for the Thouless-Anderson-Palmer equations of Ising models with arbitrary rotation-invariant ensembles of random coupling matrices.

Phys Rev E. 2019 Jun;99(6-1):062140. doi: 10.1103/PhysRevE.99.062140.

The augmented lagrange multipliers method for matrix completion from corrupted samplings with application to mixed Gaussian-impulse noise removal.用于从损坏采样中进行矩阵补全的增广拉格朗日乘子法及其在混合高斯脉冲噪声去除中的应用。

PLoS One. 2014 Sep 23;9(9):e108125. doi: 10.1371/journal.pone.0108125. eCollection 2014.

Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling.用于概率建模的自适应和自平均 Thouless-Anderson-Palmer 平均场理论

Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Nov;64(5 Pt 2):056131. doi: 10.1103/PhysRevE.64.056131. Epub 2001 Oct 30.

引用本文的文献

Leveraging Machine Learning for Advanced Nanoscale X-ray Analysis: Unmixing Multicomponent Signals and Enhancing Chemical Quantification.利用机器学习进行先进的纳米级X射线分析：解混多组分信号并增强化学定量分析

Nano Lett. 2024 Aug 21;24(33):10177-10185. doi: 10.1021/acs.nanolett.4c02446. Epub 2024 Aug 6.

本文引用的文献

Optimal errors and phase transitions in high-dimensional generalized linear models.高维广义线性模型中的最优误差与相变

Proc Natl Acad Sci U S A. 2019 Mar 19;116(12):5451-5460. doi: 10.1073/pnas.1802705116. Epub 2019 Mar 1.

The Genotype-Tissue Expression (GTEx) project.基因型-组织表达 (GTEx) 项目。

Nat Genet. 2013 Jun;45(6):580-5. doi: 10.1038/ng.2653.

Message-passing algorithms for compressed sensing.基于消息传递的压缩感知算法。

Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):18914-9. doi: 10.1073/pnas.0909892106. Epub 2009 Oct 26.

Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling.用于概率建模的自适应和自平均 Thouless-Anderson-Palmer 平均场理论

Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Nov;64(5 Pt 2):056131. doi: 10.1103/PhysRevE.64.056131. Epub 2001 Oct 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

结构化主成分分析的基本限制以及如何达到这些限制。

Fundamental limits in structured principal component analysis and how to reach them.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献