Department of Biological Sciences, Eastern Kentucky University, Richmond, KY 40475, USA.
BMC Evol Biol. 2014 Feb 17;14:23. doi: 10.1186/1471-2148-14-23.
Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data.
We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa.
Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade (Monilophyta), Equisetales + Psilotales are sister to Marattiales + leptosporangiate ferns. Our analyses also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses, and we provide suggestions for future analyses that will likely incorporate plastid genome sequence data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding strategies.
下一代测序技术为越来越多样化的绿色植物(Viridiplantae)提供了丰富的质体基因组序列数据。尽管这些数据有助于解决许多类群(如绿藻、被子植物和裸子植物)的系统发育关系,但它们对于推断所有绿色植物之间的关系的效用尚不确定。Viridiplantae 起源于 7 亿至 15 亿年前,可能包含多达 50 万个物种。这个类群代表了光合作用碳的主要来源,并包含了极其多样化的生命形式,包括一些最小和最大的真核生物。在这里,我们探讨了从现有的完整或近乎完整的质体基因组序列数据中推断出全面的绿色植物系统发育的限制和挑战。
我们从 GenBank 中可用的 360 种具有完整或近乎完整质体基因组序列的多样性绿色植物类群中组装了 78 个基因的蛋白质编码序列数据。质体数据的系统发育分析很好地恢复了支持的骨干关系,并强烈支持了在以前对 Viridiplantae 主要亚类群的分析中未观察到的关系。然而,在某些分析中也有系统误差的证据。在某些情况下,我们从核苷酸与氨基酸特征的分析中获得了强烈支持但相互矛盾的拓扑结构,并且谱系之间和单个基因组内 GC 含量的巨大差异影响了几个类群的系统发育位置。
质体序列数据的分析恢复了一个强烈支持的绿色植物关系框架。这个框架包括:i)将 Zygnematophyceace 置于陆地植物(Embryophyta)的姐妹位置,ii)现存裸子植物(Acrogymnospermae)的一个分支与苏铁类植物+银杏的姐妹关系,以及现存裸子植物的其余部分与石松类植物(Gnetophyta)的姐妹关系,而非松柏类植物(Pinaceae)的姐妹关系,iii)在单叶植物类群(Monilophyta)中,石松类植物+木贼类植物是马尾树目植物+有孢子叶的蕨类植物的姐妹关系。我们的分析还突出了在深层系统发育分析中使用质体基因组序列的挑战,并且我们提供了未来分析的建议,这些分析可能会包含数千个物种的质体基因组序列数据。我们特别强调了探索不同分区和字符编码策略的影响的重要性。