Ruane Sara, Raxworthy Christopher J, Lemmon Alan R, Lemmon Emily Moriarty, Burbrink Frank T
Department of Herpetology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.
Department of Biology, Florida State University, 319 Stadium Drive, P.O. Box 3064295, Tallahassee, FL, 32306-4295, USA.
BMC Evol Biol. 2015 Oct 12;15:221. doi: 10.1186/s12862-015-0503-1.
Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar's serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci.
Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species tree approaches. We also examined the individual gene trees in comparison to the 377-locus species tree using the program MetaTree.
Using the full anchored dataset under a variety of methods gave us the same, well-supported phylogeny for pseudoxyrhophiines. The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations associated with tooth venom delivery may have played a major role in the early diversification of this radiation. The comparison of tree topologies from the concatenated and species-tree methods using different datasets indicated the 5-locus dataset cannot beused to infer a correct phylogeny for the pseudoxyrhophiines under any method tested here and that summary statistics methods require 50 or more loci to consistently recover the species-tree inferred using the complete anchored dataset. However, as few as 15 loci may infer the correct topology when using the full coalescent species tree method *BEAST. MetaTree analyses of each gene tree from the Sanger and anchored datasets found that none of the individual gene trees matched the 377-locus species tree, and that no gene trees were identical with respect to topology.
Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summaryspecies-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci necessary to infer robust species trees.
随着成本下降以及从全基因组捕获基因座的能力提高,利用高通量新一代测序(NGS)平台生成的分子数据推断系统发育变得越来越普遍。虽然人们普遍认为,更多数量的独立基因座应能带来更可靠的系统发育估计,但很少有研究将常用遗传标记的较小数据集所产生的系统发育与使用NGS捕获的大数据集所产生的系统发育进行比较。在这里,我们确定一个包含5个基因座的桑格数据集与一个包含377个基因座的锚定基因组数据集相比,在理解以马达加斯加为中心的伪蝰蛇科蛇类辐射的进化历史方面表现如何。伪蝰蛇科约占马达加斯加蛇类多样性的86%,但在生态学、行为学和系统学方面却鲜为人知。利用包含377个基因座的NGS数据集以及总结统计物种树方法STAR和MP-EST,我们估计了一个得到充分支持的物种树,它为伪蝰蛇科的属间关系提供了新的见解。我们还比较了这些方法以及其他方法在使用不同数量基因座的数据集估计树拓扑结构方面的表现。
我们分别使用桑格测序和锚定系统发育基因组学方法,对23个伪蝰蛇分类单元的包含5个和377个基因座的数据集进行了测序。对于每个数据集,我们使用基因树(串联)和物种树(STAR、MP-EST)方法估计系统发育。我们使用罗宾逊-福尔兹距离确定不同数据集所得树拓扑结构的相似性。此外,我们使用相同的树推断方法以及程序*BEAST,研究了这些数据的子集与完整的桑格数据集和锚定数据集相比在系统发育准确性方面的表现,以确定用于物种树估计的完整合并模型是否能使用比总结统计物种树方法更少的基因座产生可靠结果。我们还使用程序MetaTree将各个基因树与包含377个基因座的物种树进行了比较。
在多种方法下使用完整的锚定数据集为我们提供了相同的、得到充分支持的伪蝰蛇科系统发育。非洲伪蝰蛇属杜贝里亚是马达加斯加伪蝰蛇属的姐妹分类单元,这为马达加斯加的单系辐射提供了证据。此外,在马达加斯加境内,推断出的两个主要分支在很大程度上对应于无沟牙类和后沟牙类属,这表明与牙毒液输送相关的摄食特化可能在该辐射的早期多样化中发挥了主要作用。使用不同数据集对串联和物种树方法所得树拓扑结构的比较表明,在本文测试的任何方法下,包含5个基因座的数据集都无法用于推断伪蝰蛇科的正确系统发育,并且总结统计方法需要50个或更多基因座才能始终如一地恢复使用完整锚定数据集推断出的物种树。然而,当使用完整的合并物种树方法*BEAST时,少至15个基因座可能就能推断出正确的拓扑结构。对桑格数据集和锚定数据集的每个基因树进行的MetaTree分析发现,没有一个单独的基因树与包含377个基因座的物种树匹配,并且在拓扑结构方面没有基因树是相同的。
我们的结果表明,使用总结物种树方法时,可能需要≥50个基因座才能可靠地推断系统发育,但基于合并的方法*BEAST仅使用15个基因座就能始终如一地恢复相同的拓扑结构。这些结果强化了少量标记的数据集可能导致误导性拓扑结构的观点,此外,用于生成系统发育的推断方法对推断可靠物种树所需的基因座数量也有重大影响。