Truman Kate, Vaughan Timothy G, Gavryushkin Alex, Gavryushkina Alexandra Sasha
Biological Data Science Laboratory, School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand.
Biomathematics Research Centre, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand.
Syst Biol. 2025 Feb 10;74(1):112-123. doi: 10.1093/sysbio/syae058.
Time-dependent birth-death sampling models have been used in numerous studies to infer past evolutionary dynamics in different biological contexts, for example, speciation and extinction rates in macroevolutionary studies, or effective reproductive number in epidemiological studies. These models are branching processes where lineages can bifurcate, die, or be sampled with time-dependent birth, death, and sampling rates, generating phylogenetic trees. It has been shown that in some subclasses of such models, different sets of rates can result in the same distributions of reconstructed phylogenetic trees, and therefore, the rates become unidentifiable from the trees regardless of their size. Here, we show that widely used time-dependent fossilized birth-death (FBD) models are identifiable. This subclass of models makes more realistic assumptions about the fossilization process and certain infectious disease transmission processes than the unidentifiable birth-death sampling models. Namely, FBD models assume that sampled lineages stay in the process rather than being immediately removed upon sampling. The identifiability of the time-dependent FBD model justifies using statistical methods that implement this model to infer the underlying temporal diversification or epidemiological dynamics from phylogenetic trees or directly from molecular or other comparative data. We further show that the time-dependent FBD model with an extra parameter, the removal after sampling probability, is unidentifiable. This implies that in scenarios where we do not know how sampling affects lineages, we are unable to infer this extra parameter together with birth, death, and sampling rates solely from trees.
时间依赖的出生-死亡抽样模型已在众多研究中被用于推断不同生物学背景下的过去进化动态,例如,宏观进化研究中的物种形成和灭绝速率,或流行病学研究中的有效繁殖数。这些模型是分支过程,其中谱系可以随着时间依赖的出生、死亡和抽样速率进行分叉、死亡或被抽样,从而生成系统发育树。已经表明,在这类模型的某些子类中,不同的速率集可以导致相同的重建系统发育树分布,因此,无论树的大小如何,从树中都无法识别出速率。在这里,我们表明广泛使用的时间依赖的化石出生-死亡(FBD)模型是可识别的。与不可识别的出生-死亡抽样模型相比,这类模型对化石形成过程和某些传染病传播过程做出了更现实的假设。具体而言,FBD模型假设抽样的谱系仍留在过程中,而不是在抽样时立即被移除。时间依赖的FBD模型的可识别性证明了使用实现该模型的统计方法从系统发育树或直接从分子或其他比较数据中推断潜在的时间多样化或流行病学动态是合理的。我们进一步表明,带有额外参数(抽样后移除概率)的时间依赖的FBD模型是不可识别的。这意味着在我们不知道抽样如何影响谱系的情况下,我们无法仅从树中推断出这个额外参数以及出生、死亡和抽样速率。