Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.
Department of Biology, Aarhus University, 8000 Aarhus C, Denmark.
Syst Biol. 2022 Aug 10;71(5):1124-1146. doi: 10.1093/sysbio/syac012.
Phylogenetic analyses are increasingly being performed with data sets that incorporate hundreds of loci. Due to incomplete lineage sorting, hybridization, and horizontal gene transfer, the gene trees for these loci may often have topologies that differ from each other and from the species tree. The effect of these topological incongruences on divergence time estimation has not been fully investigated. Using a series of simulation experiments and empirical analyses, we demonstrate that when topological incongruence between gene trees and the species tree is not accounted for, the temporal duration of branches in regions of the species tree that are affected by incongruence is underestimated, whilst the duration of other branches is considerably overestimated. This effect becomes more pronounced with higher levels of topological incongruence. We show that this pattern results from the erroneous estimation of the number of substitutions along branches in the species tree, although the effect is modulated by the assumptions inherent to divergence time estimation, such as those relating to the fossil record or among-branch-substitution-rate variation. By only analyzing loci with gene trees that are topologically congruent with the species tree, or only taking into account the branches from each gene tree that are topologically congruent with the species tree, we demonstrate that the effects of topological incongruence can be ameliorated. Nonetheless, even when topologically congruent gene trees or topologically congruent branches are selected, error in divergence time estimates remains. This stems from temporal incongruences between divergence times in species trees and divergence times in gene trees, and more importantly, the difficulty of incorporating necessary assumptions for divergence time estimation. [Divergence time estimation; gene trees; species tree; topological incongruence.].
系统发育分析越来越多地使用包含数百个基因座的数据进行。由于不完全谱系分选、杂交和水平基因转移,这些基因座的基因树往往具有彼此不同且与种系发生树不同的拓扑结构。这些拓扑不和谐对分歧时间估计的影响尚未得到充分研究。通过一系列模拟实验和实证分析,我们证明,当基因树与种系发生树之间存在拓扑不和谐时,如果不考虑这种不和谐,受影响的种系发生树分支的时间持续时间会被低估,而其他分支的持续时间会被大大高估。这种影响随着拓扑不和谐程度的增加而更加明显。我们表明,这种模式是由于错误估计了种系发生树上分支的替代数量所致,尽管这种效应受到分歧时间估计中固有的假设的调节,例如与化石记录或分支间替代率变化有关的假设。通过仅分析与种系发生树拓扑一致的基因树的基因座,或者仅考虑与种系发生树拓扑一致的每个基因树的分支,我们证明可以减轻拓扑不和谐的影响。尽管如此,即使选择拓扑一致的基因树或拓扑一致的分支,分歧时间估计仍会存在误差。这源于种系发生树中的分歧时间与基因树中的分歧时间之间的时间不和谐,更重要的是,难以纳入分歧时间估计所需的假设。[分歧时间估计;基因树;种系发生树;拓扑不和谐。]。