Barbier Jean, Camilli Francesco, Mondelli Marco, Sáenz Manuel
Quantitative Life Sciences and Mathematics Sections, International Centre for Theoretical Physics, Trieste 34151, Italy.
Institute of Science and Technology Austria, Klosterneuburg 3400, Austria.
Proc Natl Acad Sci U S A. 2023 Jul 25;120(30):e2302028120. doi: 10.1073/pnas.2302028120. Epub 2023 Jul 18.
How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless-Anderson-Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.
测量噪声中的统计相关性如何影响高维推理?为了回答这个问题,我们研究了主成分分析(PCA)的典型尖峰矩阵模型,其中一个秩一矩阵被加性噪声破坏。我们超越了对噪声项通常的独立性假设,通过从低阶多项式正交矩阵系综中抽取噪声。由此产生的噪声相关性使得该设置与应用相关,但在分析上具有挑战性。我们给出了该模型中贝叶斯最优推理极限的特征描述。如果尖峰是旋转不变的,我们表明标准谱PCA是最优的。然而,对于更一般的先验,PCA和现有的近似消息传递算法(AMP)都未能达到信息论极限,我们使用统计物理学中的副本方法来计算该极限。因此,我们受自适应 Thouless-Anderson-Palmer 方程理论的启发提出了一种AMP,通过实验观察发现它能达到推测的理论极限。这种AMP伴随着严格的状态演化分析来跟踪其性能。尽管我们专注于特定的噪声分布,但我们的方法可以推广到更广泛的迹矩阵系综类别,代价是表达式会更复杂。最后,尽管看似有旋转不变噪声这个很强的假设,但我们的理论通过实验预测了在真实数据上的算法性能,表明存在很强的普遍性性质。