Warnow Tandy
Department of Computer Science, University of Illinois at Urbana-Champaign. Urbana, Illinois, USA.
PLoS Curr. 2015 May 22;7:ecurrents.currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7. doi: 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7.
Incomplete lineage sorting (ILS), modelled by the multi-species coalescent, is a process that results in a gene tree being different from the species tree. Because ILS is expected to occur for at least some loci within genome-scale analyses, the evaluation of species tree estimation methods in the presence of ILS is of great interest. Performance on simulated and biological data have suggested that concatenation analyses can result in the wrong tree with high support under some conditions, and a recent theoretical result by Roch and Steel proved that concatenation using unpartitioned maximum likelihood analysis can be statistically inconsistent in the presence of ILS. In this study, we survey the major species tree estimation methods, including the newly proposed "statistical binning" methods, and discuss their theoretical properties. We also note that there are two interpretations of the term "statistical consistency", and discuss the theoretical results proven under both interpretations.
由多物种溯祖模型模拟的不完全谱系分选(ILS)是一种导致基因树与物种树不同的过程。由于在基因组规模分析中预计至少某些基因座会发生ILS,因此在存在ILS的情况下评估物种树估计方法备受关注。对模拟数据和生物学数据的分析表明,在某些条件下,串联分析可能会以高支持率得到错误的树,并且Roch和Steel最近的一项理论结果证明,在存在ILS的情况下,使用未划分的最大似然分析进行串联在统计上可能是不一致的。在本研究中,我们考察了主要的物种树估计方法,包括新提出的“统计分箱”方法,并讨论了它们的理论特性。我们还指出,“统计一致性”一词有两种解释,并讨论了在这两种解释下已证明的理论结果。