Stadler Tanja, Degnan James H, Rosenberg Noah A
Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel Swiss Institute of Bioinformatics (SIB), 1015 Lausanne, Switzerland
Department of Mathematics and Statistics, University of New Mexico, 311 Terrace NE, Albuquerque, NM, 87131, USA;
Syst Biol. 2016 Jul;65(4):628-39. doi: 10.1093/sysbio/syw019. Epub 2016 Mar 11.
Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion.
用于物种形成和灭绝的经典零模型产生的系统发育树在分布上与实证系统发育树不同。特别是,与常见零模型预测的系统发育树相比,实证系统发育树的平衡性较差,且分支时间更接近根部。这种差异可能是由于物种形成和灭绝过程的零模型过于简单,或者是由于实证数据集不能代表随机系统发育树。第三种可能性出现是因为系统发育重建方法通常推断的是基因树而非物种树,这使得预测物种树模式的模型与考虑基因树的实证分析之间产生了不一致。我们研究了在合并的出生 - 死亡和多物种溯祖模型下基因树和物种树之间的差异在多大程度上可以解释实证树和出生 - 死亡物种树之间的差异。我们模拟嵌入在模拟物种树中的基因树,并研究它们在树的平衡性和分支时间方面的差异。我们观察到基因树的平衡性较差,并且通常比物种树的分支时间更接近根部。来自TreeBase的实证树也比我们模拟的物种树平衡性更差,并且与物种树相比,模型基因树可以解释高达8%的不平衡增加。然而,我们在实证树中看到了大得多的不平衡增加,约为100%,这意味着其他特征也必定导致了实证树的不平衡。这项模拟研究强调了重新审视系统发育分析中所做假设的必要性,因为这些假设,例如将基因树等同于物种树,可能会导致有偏差的结论。