Roch Sebastien, Warnow Tandy
Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
Syst Biol. 2015 Jul;64(4):663-76. doi: 10.1093/sysbio/syv016. Epub 2015 Mar 25.
The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.
使用多个基因座估计物种树已变得越来越普遍。由于多种生物学原因,不同的基因座可能具有不同的系统发育历史(反映在不同的基因树拓扑结构中),因此已经开发了新的物种树估计方法,这些方法考虑了基因树的异质性。在这些多种原因中,由多物种合并模型模拟的不完全谱系分选(ILS)可能是基因树异质性最常见的原因,并且最近文献的大部分焦点都集中在如何在存在ILS的情况下估计物种树。尽管在开发当基因树因ILS而不同时估计物种树的统计一致技术方面取得了进展,但系统发育学界对于是使用新的基于合并的方法还是传统的串联方法存在很大争议。提出的关键问题之一是理解基因树估计误差对通过组合基因树进行操作的基于合并的方法的影响。在这里,我们探讨在分析估计的而非真实的基因树时基于合并的方法的数学保证。我们的结果为基于合并的方法在理论上的前景与其在实践中的表现之间的差异提供了一些见解。