Gatesy John, Springer Mark S
Department of Biology, University of California, Riverside, CA 92521, USA.
Department of Biology, University of California, Riverside, CA 92521, USA.
Mol Phylogenet Evol. 2014 Nov;80:231-66. doi: 10.1016/j.ympev.2014.08.013. Epub 2014 Aug 22.
Large datasets are required to solve difficult phylogenetic problems that are deep in the Tree of Life. Currently, two divergent systematic methods are commonly applied to such datasets: the traditional supermatrix approach (= concatenation) and "shortcut" coalescence (= coalescence methods wherein gene trees and the species tree are not co-estimated). When applied to ancient clades, these contrasting frameworks often produce congruent results, but in recent phylogenetic analyses of Placentalia (placental mammals), this is not the case. A recent series of papers has alternatively disputed and defended the utility of shortcut coalescence methods at deep phylogenetic scales. Here, we examine this exchange in the context of published phylogenomic data from Mammalia; in particular we explore two critical issues - the delimitation of data partitions ("genes") in coalescence analysis and hidden support that emerges with the combination of such partitions in phylogenetic studies. Hidden support - increased support for a clade in combined analysis of all data partitions relative to the support evident in separate analyses of the various data partitions, is a hallmark of the supermatrix approach and a primary rationale for concatenating all characters into a single matrix. In the most extreme cases of hidden support, relationships that are contradicted by all gene trees are supported when all of the genes are analyzed together. A valid fear is that shortcut coalescence methods might bypass or distort character support that is hidden in individual loci because small gene fragments are analyzed in isolation. Given the extensive systematic database for Mammalia, the assumptions and applicability of shortcut coalescence methods can be assessed with rigor to complement a small but growing body of simulation work that has directly compared these methods to concatenation. We document several remarkable cases of hidden support in both supermatrix and coalescence paradigms and argue that in most instances, the emergent support in the shortcut coalescence analyses is an artifact. By referencing rigorous molecular clock studies of Mammalia, we suggest that inaccurate gene trees that imply unrealistically deep coalescences debilitate shortcut coalescence analyses of the placental dataset. We document contradictory coalescence results for Placentalia, and outline a critical conundrum that challenges the general utility of shortcut coalescence methods at deep phylogenetic scales. In particular, the basic unit of analysis in coalescence analysis, the coalescence-gene, is expected to shrink in size as more taxa are analyzed, but as the amount of data for reconstruction of a gene tree ratchets downward, the number of nodes in the gene tree that need to be resolved ratchets upward. Some advocates of shortcut coalescence methods have attempted to address problems with inaccurate gene trees by concatenating multiple coalescence-genes to yield "gene trees" that better match the species tree. However, this hybrid concatenation/coalescence approach, "concatalescence," contradicts the most basic biological rationale for performing a coalescence analysis in the first place. We discuss this reality in the context of recent simulation work that also suggests inaccurate reconstruction of gene trees is more problematic for shortcut coalescence methods than deep coalescence of independently segregating loci is for concatenation methods.
解决生命之树深处的复杂系统发育问题需要大量数据集。目前,两种不同的系统发育方法通常应用于此类数据集:传统的超矩阵方法(即串联法)和“捷径”合并法(即基因树和物种树不共同估计的合并方法)。当应用于古老的进化枝时,这些截然不同的框架通常会产生一致的结果,但在胎盘类(胎盘哺乳动物)最近的系统发育分析中,情况并非如此。最近一系列论文对捷径合并法在深度系统发育尺度上的实用性进行了争论和辩护。在这里,我们在已发表的哺乳动物系统发育组学数据的背景下审视这种交流;特别是,我们探讨了两个关键问题——合并分析中数据分区(“基因”)的界定以及系统发育研究中这些分区组合所产生的隐藏支持。隐藏支持——相对于各个数据分区单独分析时明显的支持,在所有数据分区的组合分析中对一个进化枝的支持增加,是超矩阵方法的一个标志,也是将所有特征串联到单个矩阵中的主要理由。在隐藏支持的最极端情况下,当所有基因一起分析时,所有基因树都与之矛盾的关系却得到了支持。一个合理的担忧是,捷径合并法可能会绕过或扭曲单个基因座中隐藏的特征支持,因为小的基因片段是单独分析的。鉴于哺乳动物广泛的系统发育数据库,可以严格评估捷径合并法的假设和适用性,以补充一小部分但不断增长的模拟工作,这些模拟工作直接将这些方法与串联法进行了比较。我们记录了超矩阵和合并范式中几个显著的隐藏支持案例,并认为在大多数情况下,捷径合并分析中出现的支持是一种假象。通过参考对哺乳动物严格的分子钟研究,我们认为暗示不切实际的深度合并的不准确基因树削弱了胎盘数据集的捷径合并分析。我们记录了胎盘类的相互矛盾的合并结果,并概述了一个关键难题,该难题挑战了捷径合并法在深度系统发育尺度上的普遍实用性。特别是,合并分析中的基本分析单位,即合并基因,预计会随着分析的分类单元增多而缩小,但随着用于重建基因树的数据量向下递减,需要解析的基因树中的节点数量却向上递增。一些捷径合并法的支持者试图通过串联多个合并基因来产生与物种树更好匹配的“基因树”,以解决不准确基因树的问题。然而,这种混合的串联/合并方法,即“串并法”,首先与进行合并分析的最基本生物学原理相矛盾。我们在最近的模拟工作背景下讨论了这一现实,该模拟工作还表明,对于捷径合并法来说,基因树的不准确重建比独立分离基因座的深度合并对于串联法来说问题更大。