Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK.
Syst Biol. 2018 Sep 1;67(5):916-924. doi: 10.1093/sysbio/syy043.
Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Herein, we examined two data sets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each data set. When removed from each data set, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate data set have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant data set did not exhibit any obvious systematic error, and therefore, may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting. Herein, we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic data sets that does not assume a single topology for all genes. For both data sets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic data sets by asking more targeted edge-based questions.
最近的研究表明,在系统基因组学研究中,基因树之间存在冲突是很常见的,在超级矩阵分析中,可能只有不到 1%的基因最终会驱动种系树推断。在此,我们检查了两个超级矩阵和基于合并的种系树相冲突的数据集。我们在每个数据集里识别出了两个高度有影响力的“异常”基因。当从每个数据集里移除这些基因时,推断的超级矩阵树与从合并分析获得的拓扑结构相匹配。我们还证明,虽然脊椎动物数据集里的异常基因在之前的研究中已经表明是同源性检测错误的结果,但植物数据集里的异常基因没有表现出任何明显的系统错误,因此,可能是尚未确定的某些生物学过程的结果。虽然在发现异常基因时,对一小部分替代拓扑结构进行拓扑比较可能会有所帮助,但它们在几个方面可能会受到限制,例如假设所有基因都具有相同的拓扑结构。合并种系树方法放宽了这个假设,但并没有明确地便于检查特定的边缘。合并方法通常还假设冲突是不完全谱系分选的结果。在此,我们探索了一个允许快速检查替代边缘和支持大型基因组数据集的框架,而不需要假设所有基因都具有相同的拓扑结构。对于这两个数据集,这些分析提供了详细的结果,证实了对基于合并的拓扑结构的支持。该框架表明,我们可以通过提出更有针对性的基于边缘的问题,来提高我们对基因组数据集底层信号的理解。