远离渐近无界：无偏高维推断不能假定数据无限。

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data.

作者信息

Abbott Michael C, Machta Benjamin B

机构信息

Department of Physics, Yale University, New Haven, CT 06520, USA.

出版信息

Entropy (Basel). 2023 Mar 1;25(3):434. doi: 10.3390/e25030434.

DOI:10.3390/e25030434

PMID:36981323

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10048238/

Abstract

Inference from limited data requires a notion of measure on parameter space, which is most explicit in the Bayesian framework as a prior distribution. Jeffreys prior is the best-known uninformative choice, the invariant volume element from information geometry, but we demonstrate here that this leads to enormous bias in typical high-dimensional models. This is because models found in science typically have an effective dimensionality of accessible behaviors much smaller than the number of microscopic parameters. Any measure which treats all of these parameters equally is far from uniform when projected onto the sub-space of relevant parameters, due to variations in the local co-volume of irrelevant directions. We present results on a principled choice of measure which avoids this issue and leads to unbiased posteriors by focusing on relevant parameters. This optimal prior depends on the quantity of data to be gathered, and approaches Jeffreys prior in the asymptotic limit. However, for typical models, this limit cannot be justified without an impossibly large increase in the quantity of data, exponential in the number of microscopic parameters.

摘要

从有限数据进行推断需要在参数空间上有一个测度概念，这在贝叶斯框架中最为明确，即作为先验分布。杰弗里斯先验是最著名的无信息选择，它是信息几何中的不变体积元素，但我们在此证明，这在典型的高维模型中会导致巨大偏差。这是因为科学中发现的模型通常具有可及行为的有效维度远小于微观参数的数量。当投影到相关参数的子空间时，任何平等对待所有这些参数的测度在无关方向的局部余体积变化的情况下，都远非均匀。我们给出了关于一种有原则的测度选择的结果，该选择通过关注相关参数避免了这个问题，并导致无偏后验。这种最优先验取决于要收集的数据量，并在渐近极限中趋近于杰弗里斯先验。然而，对于典型模型，在数据量没有以微观参数数量的指数形式进行不可能的大幅增加的情况下，这个极限是不合理的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa58/10048238/6ed6a87519f0/entropy-25-00434-g0A1.jpg

相似文献

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data.

Entropy (Basel). 2023 Mar 1;25(3):434. doi: 10.3390/e25030434.

Maximizing the information learned from finite data selects a simple model.

Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1760-1765. doi: 10.1073/pnas.1715306115. Epub 2018 Feb 6.

Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: a reply to Hickerson et al.

Evolution. 2014 Dec;68(12):3607-17. doi: 10.1111/evo.12523. Epub 2014 Oct 13.

Bayesian estimation of the number of species using noninformative priors.

Biom J. 2008 Dec;50(6):1064-76. doi: 10.1002/bimj.200810445.

Bayesian Inference Using the Proximal Mapping: Uncertainty Quantification Under Varying Dimensionality.

J Am Stat Assoc. 2024;119(547):1847-1858. doi: 10.1080/01621459.2023.2220170. Epub 2023 Jul 10.

The Bayesian Inference of Pareto Models Based on Information Geometry.

Entropy (Basel). 2020 Dec 30;23(1):45. doi: 10.3390/e23010045.

Weyl Prior and Bayesian Statistics.

Entropy (Basel). 2020 Apr 20;22(4):467. doi: 10.3390/e22040467.

Bayesian parameter inference and model selection by population annealing in systems biology.

PLoS One. 2014 Aug 4;9(8):e104057. doi: 10.1371/journal.pone.0104057. eCollection 2014.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Information geometry for multiparameter models: new perspectives on the origin of simplicity.

Rep Prog Phys. 2022 Dec 28;86(3). doi: 10.1088/1361-6633/aca6f8.

引用本文的文献

How Occam's razor guides human decision-making.

bioRxiv. 2025 Mar 16:2023.01.10.523479. doi: 10.1101/2023.01.10.523479.

Information geometry for multiparameter models: new perspectives on the origin of simplicity.

Rep Prog Phys. 2022 Dec 28;86(3). doi: 10.1088/1361-6633/aca6f8.

本文引用的文献

Information geometry for multiparameter models: new perspectives on the origin of simplicity.

Rep Prog Phys. 2022 Dec 28;86(3). doi: 10.1088/1361-6633/aca6f8.

Pathological Spectra of the Fisher Information Metric and Its Variants in Deep Neural Networks.

Neural Comput. 2021 Jul 26;33(8):2274-2307. doi: 10.1162/neco_a_01411.

Maximizing the information learned from finite data selects a simple model.

Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1760-1765. doi: 10.1073/pnas.1715306115. Epub 2018 Feb 6.

Bridging Mechanistic and Phenomenological Models of Complex Biological Systems.

PLoS Comput Biol. 2016 May 17;12(5):e1004915. doi: 10.1371/journal.pcbi.1004915. eCollection 2016 May.

Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach.

J Gen Physiol. 2014 Mar;143(3):401-16. doi: 10.1085/jgp.201311116. Epub 2014 Feb 10.

Parameter space compression underlies emergent theories and predictive models.

Science. 2013 Nov 1;342(6158):604-7. doi: 10.1126/science.1238723.

Correlations in ion channel expression emerge from homeostatic tuning rules.

Proc Natl Acad Sci U S A. 2013 Jul 9;110(28):E2645-54. doi: 10.1073/pnas.1309966110. Epub 2013 Jun 24.

An invariant form for the prior probability in estimation problems.

Proc R Soc Lond A Math Phys Sci. 1946;186(1007):453-61. doi: 10.1098/rspa.1946.0056.

Why are nonlinear fits to data so challenging?

Phys Rev Lett. 2010 Feb 12;104(6):060201. doi: 10.1103/PhysRevLett.104.060201. Epub 2010 Feb 10.

Sloppiness, robustness, and evolvability in systems biology.

Curr Opin Biotechnol. 2008 Aug;19(4):389-95. doi: 10.1016/j.copbio.2008.06.008. Epub 2008 Jul 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

远离渐近无界：无偏高维推断不能假定数据无限。

Far from Asymptopia: Unbiased High-Dimensional Inference Cannot Assume Unlimited Data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献