Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.
Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
Syst Biol. 2020 May 1;69(3):521-529. doi: 10.1093/sysbio/syz054.
Reconstructing ancestral characters and traits along a phylogenetic tree is central to evolutionary biology. It is the key to understanding morphology changes among species, inferring ancestral biochemical properties of life, or recovering migration routes in phylogeography. The goal is 2-fold: to reconstruct the character state at the tree root (e.g., the region of origin of some species) and to understand the process of state changes along the tree (e.g., species flow between countries). We deal here with discrete characters, which are "unique," as opposed to sequence characters (nucleotides or amino-acids), where we assume the same model for all the characters (or for large classes of characters with site-dependent models) and thus benefit from multiple information sources. In this framework, we use mathematics and simulations to demonstrate that although each goal can be achieved with high accuracy individually, it is generally impossible to accurately estimate both the root state and the rates of state changes along the tree branches, from the observed data at the tips of the tree. This is because the global rates of state changes along the branches that are optimal for the two estimation tasks have opposite trends, leading to a fundamental trade-off in accuracy. This inherent "Darwinian uncertainty principle" concerning the simultaneous estimation of "patterns" and "processes" governs ancestral reconstructions in biology. For certain tree shapes (typically speciation trees) the uncertainty of simultaneous estimation is reduced when more tips are present; however, for other tree shapes it does not (e.g., coalescent trees used in population genetics).
沿着系统发育树重建祖先特征和性状是进化生物学的核心。它是理解物种间形态变化、推断生命祖先生化特性或恢复系统地理学中迁移路线的关键。目标有两个:重建树根部的特征状态(例如,某些物种的起源区域),并理解树状结构上状态变化的过程(例如,国家之间的物种流动)。我们这里处理的是离散特征,它们是“独特的”,与序列特征(核苷酸或氨基酸)不同,在序列特征中,我们假设所有特征(或具有基于位置的模型的大类别特征)都采用相同的模型,从而受益于多个信息源。在这个框架中,我们使用数学和模拟来证明,尽管可以单独以高精度实现每个目标,但通常不可能从树的顶端的观察数据准确地估计树状结构上分支的根部状态和状态变化的速率。这是因为,对于两个估计任务,最佳的全局状态变化速率在分支上呈相反趋势,从而导致准确性的基本权衡。这种关于“模式”和“过程”同时估计的内在“达尔文不确定性原则”支配着生物学中的祖先重建。对于某些树形状(通常是物种形成树),当存在更多的顶端时,同时估计的不确定性会降低;然而,对于其他树形状,情况并非如此(例如,群体遗传学中使用的合并树)。