Suppr超能文献

结构化主成分分析的基本限制以及如何达到这些限制。

Fundamental limits in structured principal component analysis and how to reach them.

作者信息

Barbier Jean, Camilli Francesco, Mondelli Marco, Sáenz Manuel

机构信息

Quantitative Life Sciences and Mathematics Sections, International Centre for Theoretical Physics, Trieste 34151, Italy.

Institute of Science and Technology Austria, Klosterneuburg 3400, Austria.

出版信息

Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2302028120. doi: 10.1073/pnas.2302028120. Epub 2023 Jul 18.

Abstract

How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless-Anderson-Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.

摘要

测量噪声中的统计相关性如何影响高维推理?为了回答这个问题,我们研究了主成分分析(PCA)的典型尖峰矩阵模型,其中一个秩一矩阵被加性噪声破坏。我们超越了对噪声项通常的独立性假设,通过从低阶多项式正交矩阵系综中抽取噪声。由此产生的噪声相关性使得该设置与应用相关,但在分析上具有挑战性。我们给出了该模型中贝叶斯最优推理极限的特征描述。如果尖峰是旋转不变的,我们表明标准谱PCA是最优的。然而,对于更一般的先验,PCA和现有的近似消息传递算法(AMP)都未能达到信息论极限,我们使用统计物理学中的副本方法来计算该极限。因此,我们受自适应 Thouless-Anderson-Palmer 方程理论的启发提出了一种AMP,通过实验观察发现它能达到推测的理论极限。这种AMP伴随着严格的状态演化分析来跟踪其性能。尽管我们专注于特定的噪声分布,但我们的方法可以推广到更广泛的迹矩阵系综类别,代价是表达式会更复杂。最后,尽管看似有旋转不变噪声这个很强的假设,但我们的理论通过实验预测了在真实数据上的算法性能,表明存在很强的普遍性性质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dac0/10374165/0ed0920cbb37/pnas.2302028120fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验