Suppr超能文献

远离渐近无界:无偏高维推断不能假定数据无限。

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data.

作者信息

Abbott Michael C, Machta Benjamin B

机构信息

Department of Physics, Yale University, New Haven, CT 06520, USA.

出版信息

Entropy (Basel). 2023 Mar 1;25(3):434. doi: 10.3390/e25030434.

Abstract

Inference from limited data requires a notion of measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. Jeffreys prior is the best-known uninformative choice, the invariant volume element from information geometry, but we demonstrate here that this leads to enormous bias in typical high-dimensional models. This is because models found in science typically have an effective dimensionality of accessible behaviors much smaller than the number of microscopic parameters. Any measure which treats all of these parameters equally is far from uniform when projected onto the sub-space of relevant parameters, due to variations in the local co-volume of irrelevant directions. We present results on a principled choice of measure which avoids this issue and leads to unbiased posteriors by focusing on relevant parameters. This optimal prior depends on the quantity of data to be gathered, and approaches Jeffreys prior in the asymptotic limit. However, for typical models, this limit cannot be justified without an impossibly large increase in the quantity of data, exponential in the number of microscopic parameters.

摘要

从有限数据进行推断需要在参数空间上有一个测度概念,这在贝叶斯框架中最为明确,即作为先验分布。杰弗里斯先验是最著名的无信息选择,它是信息几何中的不变体积元素,但我们在此证明,这在典型的高维模型中会导致巨大偏差。这是因为科学中发现的模型通常具有可及行为的有效维度远小于微观参数的数量。当投影到相关参数的子空间时,任何平等对待所有这些参数的测度在无关方向的局部余体积变化的情况下,都远非均匀。我们给出了关于一种有原则的测度选择的结果,该选择通过关注相关参数避免了这个问题,并导致无偏后验。这种最优先验取决于要收集的数据量,并在渐近极限中趋近于杰弗里斯先验。然而,对于典型模型,在数据量没有以微观参数数量的指数形式进行不可能的大幅增加的情况下,这个极限是不合理的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa58/10048238/6ed6a87519f0/entropy-25-00434-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验