Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden; and Museum of Archaeology, University of Stavanger, NO-4036 Stavanger, Norway
Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden; and Museum of Archaeology, University of Stavanger, NO-4036 Stavanger, Norway Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden; and Museum of Archaeology, University of Stavanger, NO-4036 Stavanger, Norway.
Syst Biol. 2015 May;64(3):448-71. doi: 10.1093/sysbio/syv004. Epub 2015 Jan 20.
There is a rising awareness that species trees are best inferred from multiple loci while taking into account processes affecting individual gene trees, such as substitution model error (failure of the model to account for the complexity of the data) and coalescent stochasticity (presence of incomplete lineage sorting [ILS]). Although most studies have been carried out in the context of dichotomous species trees, these processes operate also in more complex evolutionary histories involving multiple hybridizations and polyploidy. Recently, methods have been developed that accurately handle ILS in allopolyploids, but they are thus far restricted to networks of diploids and tetraploids. We propose a procedure that improves on this limitation by designing a workflow that assigns homoeologs to hypothetical diploid ancestral genomes prior to genome tree construction. Conflicting assignment hypotheses are evaluated against substitution model error and coalescent stochasticity. Incongruence that cannot be explained by stochastic mechanisms needs to be explained by other processes (e.g., homoploid hybridization or paralogy). The data can then be filtered to build multilabeled genome phylogenies using inference methods that can recover species trees, either in the face of substitution model error and coalescent stochasticity alone, or while simultaneously accounting for hybridization. Methods are already available for folding the resulting multilabeled genome phylogeny into a network. We apply the workflow to the reconstruction of the reticulate phylogeny of the plant genus Fumaria (Papaveraceae) with ploidal levels ranging from 2[Formula: see text] to 14[Formula: see text]. We describe the challenges in recovering nuclear NRPB2 homoeologs in high ploidy species while combining in vivo cloning and direct sequencing techniques. Using parametric bootstrapping simulations we assign nuclear homoeologs and chloroplast sequences (four concatenated loci) to their common hypothetical diploid ancestral genomes. As these assignments hinge on effective population size assumptions, we investigate how varying these assumptions impacts the recovered multilabeled genome phylogeny.
人们越来越意识到,在考虑影响单个基因树的过程(如替代模型错误(模型未能解释数据的复杂性)和合并随机性(存在不完全谱系分类[ILS]))时,最好从多个基因座推断物种树。尽管大多数研究都是在二歧物种树的背景下进行的,但这些过程也存在于涉及多次杂交和多倍体的更复杂的进化历史中。最近,已经开发出了一些方法,可以准确处理异源多倍体中的 ILS,但迄今为止,这些方法仅限于二倍体和四倍体的网络。我们提出了一种程序,通过在构建基因组树之前为假设的二倍体祖先基因组分配同源基因,从而改进了这一限制。对冲突的分配假设进行替代模型错误和合并随机性的评估。不能用随机机制解释的不一致性需要用其他过程来解释(例如,同倍体杂交或并系)。然后,可以对数据进行过滤,使用可以恢复物种树的推断方法构建多标签基因组系统发育,无论是面对替代模型错误和合并随机性,还是同时考虑杂交。已经有方法可以将由此产生的多标签基因组系统发育折叠成网络。我们将该工作流程应用于构建属植物 Fumaria(罂粟科)的网状系统发育,其倍性水平从 2[Formula: see text]到 14[Formula: see text]不等。我们描述了在结合体内克隆和直接测序技术时,在高倍性物种中恢复核 NRPB2 同源基因的挑战。使用参数引导模拟,我们将核同源基因和叶绿体序列(四个串联基因座)分配给它们共同的假设的二倍体祖先基因组。由于这些分配取决于有效种群大小的假设,我们研究了改变这些假设如何影响恢复的多标签基因组系统发育。