系统发生出生-死亡模型的可识别性与推断

Identifiability and inference of phylogenetic birth-death models.

作者信息

Legried Brandon, Terhorst Jonathan

机构信息

School of Mathematics, Georgia Institute of Technology, 686 Cherry Street, Atlanta, 30332, GA, USA.

Department of Statistics, University of Michigan, 1085 S. University Ave, Ann Arbor, 48109, MI, USA.

出版信息

J Theor Biol. 2023 Jul 7;568:111520. doi: 10.1016/j.jtbi.2023.111520. Epub 2023 May 4.

DOI:10.1016/j.jtbi.2023.111520

PMID:37148965

Abstract

Recent theoretical work on phylogenetic birth-death models offers differing viewpoints on whether they can be estimated using lineage-through-time data. Louca and Pennell (2020) showed that the class of models with continuously differentiable rate functions is nonidentifiable: any such model is consistent with an infinite collection of alternative models, which are statistically indistinguishable regardless of how much data are collected. Legried and Terhorst (2022) qualified this grave result by showing that identifiability is restored if only piecewise constant rate functions are considered. Here, we contribute new theoretical results to this discussion, in both the positive and negative directions. Our main result is to prove that models based on piecewise polynomial rate functions of any order and with any (finite) number of pieces are statistically identifiable. In particular, this implies that spline-based models with an arbitrary number of knots are identifiable. The proof is simple and self-contained, relying mainly on basic algebra. We complement this positive result with a negative one, which shows that even when identifiability holds, rate function estimation is still a difficult problem. To illustrate this, we prove some rates-of-convergence results for hypothesis testing using birth-death models. These results are information-theoretic lower bounds which apply to all potential estimators.

摘要

近期关于系统发生出生-死亡模型的理论研究，对于能否使用沿时间谱系数据进行估计给出了不同观点。卢卡和彭内尔（2020年）表明，具有连续可微速率函数的模型类别是无法识别的：任何此类模型都与无穷多个替代模型一致，无论收集多少数据，这些替代模型在统计上都无法区分。勒格里德和特尔霍斯特（2022年）对这一严峻结果进行了修正，表明如果仅考虑分段常数速率函数，可识别性就能恢复。在此，我们在正反两个方向上为这一讨论贡献了新的理论成果。我们的主要结果是证明，基于任意阶分段多项式速率函数且具有任意（有限）段数的模型在统计上是可识别的。特别地，这意味着具有任意数量节点的基于样条的模型是可识别的。证明过程简单且自成一体，主要依赖于基础代数。我们用一个负面结果对这一正面结果进行补充，该负面结果表明，即使可识别性成立，速率函数估计仍然是一个难题。为了说明这一点，我们证明了一些使用出生-死亡模型进行假设检验的收敛速率结果。这些结果是信息论下界，适用于所有潜在估计量。

相似文献

Identifiability and inference of phylogenetic birth-death models.

J Theor Biol. 2023 Jul 7;568:111520. doi: 10.1016/j.jtbi.2023.111520. Epub 2023 May 4.

A class of identifiable phylogenetic birth-death models.

Proc Natl Acad Sci U S A. 2022 Aug 30;119(35):e2119513119. doi: 10.1073/pnas.2119513119. Epub 2022 Aug 22.

Estimating trees from filtered data: identifiability of models for morphological phylogenetics.

J Theor Biol. 2010 Mar 7;263(1):108-19. doi: 10.1016/j.jtbi.2009.12.001. Epub 2009 Dec 11.

Identifiability of parameters in MCMC Bayesian inference of phylogeny.

Syst Biol. 2002 Oct;51(5):754-60. doi: 10.1080/10635150290102429.

A confidence building exercise in data and identifiability: Modeling cancer chemotherapy as a case study.

J Theor Biol. 2017 Oct 27;431:63-78. doi: 10.1016/j.jtbi.2017.07.018. Epub 2017 Jul 19.

Identifiability and numerical algebraic geometry.

PLoS One. 2019 Dec 13;14(12):e0226299. doi: 10.1371/journal.pone.0226299. eCollection 2019.

Identifiability of two-tree mixtures for group-based models.

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):710-22. doi: 10.1109/TCBB.2010.79.

Consistency and identifiability of the polymorphism-aware phylogenetic models.

J Theor Biol. 2020 Feb 7;486:110074. doi: 10.1016/j.jtbi.2019.110074. Epub 2019 Nov 8.

Identifiability of tree-child phylogenetic networks under a probabilistic recombination-mutation model of evolution.

J Theor Biol. 2018 Jun 7;446:160-167. doi: 10.1016/j.jtbi.2018.03.011. Epub 2018 Mar 13.

DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

Ann Stat. 2014;42(6):2469-2493. doi: 10.1214/14-AOS1264. Epub 2014 Oct 20.

引用本文的文献

Exact and efficient phylodynamic simulation from arbitrarily large populations.

Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2412978122. doi: 10.1073/pnas.2412978122. Epub 2025 May 14.

The effects of cryptic diversity on diversification dynamics analyses in Crocodylia.

Proc Biol Sci. 2025 Mar;292(2043):20250091. doi: 10.1098/rspb.2025.0091. Epub 2025 Mar 19.

Evolutionary and epidemic dynamics of COVID-19 in Germany exemplified by three Bayesian phylodynamic case studies.

Bioinform Biol Insights. 2025 Mar 12;19:11779322251321065. doi: 10.1177/11779322251321065. eCollection 2025.

Reading tree leaves: inferring speciation anfd extinction processes using phylogenies.

Philos Trans R Soc Lond B Biol Sci. 2025 Feb 13;380(1919):20230309. doi: 10.1098/rstb.2023.0309. Epub 2025 Feb 20.

The Fossilized Birth-Death Model Is Identifiable.

Syst Biol. 2025 Feb 10;74(1):112-123. doi: 10.1093/sysbio/syae058.

A Diffusion-Based Approach for Simulating Forward-in-Time State-Dependent Speciation and Extinction Dynamics.

Bull Math Biol. 2024 Jul 6;86(8):101. doi: 10.1007/s11538-024-01337-6.

Exact and efficient phylodynamic simulation from arbitrarily large populations.

ArXiv. 2024 Aug 10:arXiv:2402.17153v2.

A Diffusion-Based Approach for Simulating Forward-in-Time State-Dependent Speciation and Extinction Dynamics.

ArXiv. 2024 Jun 24:arXiv:2402.00246v2.

Alternate histories in macroevolution.

Proc Natl Acad Sci U S A. 2023 Feb 28;120(9):e2300967120. doi: 10.1073/pnas.2300967120. Epub 2023 Feb 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

系统发生出生-死亡模型的可识别性与推断

Identifiability and inference of phylogenetic birth-death models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献