1 Department of Computer Science, The University of Auckland , Auckland 1010 , New Zealand.
2 Department of Biology, Indiana University , Bloomington, IN 47405 , USA.
Philos Trans R Soc Lond B Biol Sci. 2019 Jul 22;374(1777):20180244. doi: 10.1098/rstb.2018.0244. Epub 2019 Jun 3.
Accurate inferences of convergence require that the appropriate tree topology be used. If there is a mismatch between the tree a trait has evolved along and the tree used for analysis, then false inferences of convergence ('hemiplasy') can occur. To avoid problems of hemiplasy when there are high levels of gene tree discordance with the species tree, researchers have begun to construct tree topologies from individual loci. However, due to intralocus recombination, even locus-specific trees may contain multiple topologies within them. This implies that the use of individual tree topologies discordant with the species tree can still lead to incorrect inferences about molecular convergence. Here, we examine the frequency with which single exons and single protein-coding genes contain multiple underlying tree topologies, in primates and Drosophila, and quantify the effects of hemiplasy when using trees inferred from individual loci. In both clades, we find that there are most often multiple diagnosable topologies within single exons and whole genes, with 91% of Drosophila protein-coding genes containing multiple topologies. Because of this underlying topological heterogeneity, even using trees inferred from individual protein-coding genes results in 25% and 38% of substitutions falsely labelled as convergent in primates and Drosophila, respectively. While constructing local trees can reduce the problem of hemiplasy, our results suggest that it will be difficult to completely avoid false inferences of convergence. We conclude by suggesting several ways forward in the analysis of convergent evolution, for both molecular and morphological characters. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
准确的趋同推断需要使用适当的树拓扑结构。如果一个特征进化的树与用于分析的树不匹配,那么就会出现错误的趋同推断(“半同型”)。为了避免在基因树与种系发生树高度不一致时出现半同型问题,研究人员开始从单个基因座构建树拓扑结构。然而,由于基因内重组,即使是特定基因座的树也可能包含多个拓扑结构。这意味着使用与种系发生树不一致的个体树拓扑结构仍然可能导致对分子趋同的错误推断。在这里,我们检查了在灵长类动物和果蝇中,单个外显子和单个蛋白质编码基因包含多个潜在树拓扑结构的频率,并量化了使用从单个基因座推断的树时半同型的影响。在这两个分支中,我们发现单个外显子和整个基因中通常存在多个可诊断的拓扑结构,91%的果蝇蛋白质编码基因包含多个拓扑结构。由于这种潜在的拓扑异质性,即使使用从单个蛋白质编码基因推断的树,也会导致 25%和 38%的替换在灵长类动物和果蝇中被错误地标记为趋同,分别。虽然构建局部树可以减少半同型问题,但我们的结果表明,完全避免趋同的错误推断将是困难的。最后,我们通过建议在分析分子和形态特征的趋同进化时的几种方法来结束本文。这篇文章是主题问题“基因组时代的趋同进化:新的见解和方向”的一部分。