Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway #C0930, Austin, TX 78713, USA.
Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway #C0930, Austin, TX 78713, USA.
Mol Phylogenet Evol. 2019 Sep;138:219-232. doi: 10.1016/j.ympev.2019.05.022. Epub 2019 May 28.
The current classification of angiosperms is based primarily on concatenated plastid markers and maximum likelihood (ML) inference. This approach has been justified by the assumption that plastid DNA (ptDNA) is inherited as a single locus and that its individual genes produce congruent trees. However, structural and functional characteristics of ptDNA suggest that plastid genes may not evolve as a single locus and are experiencing different evolutionary forces. To examine this idea, we produced new complete plastid genome (plastome) sequences of 27 species and combined these data with publicly available sequences to produce a final dataset that includes 78 plastid genes for 89 species of rosids and five outgroups. We used four data matrices (i.e., gene, exon, codon-aligned, and amino acid) to infer species and gene trees using ML and multispecies coalescent (MSC) methods. Rosids include about one third of all angiosperms and their two major clades, fabids and malvids, were recovered in almost all analyses. However, we detected incongruence between species trees inferred with different matrices and methods and previously published plastid and nuclear phylogenies. We visualized and tested the significance of incongruence between gene trees and species trees. We then measured the distribution of phylogenetic signal across sites and genes supporting alternative placements of five controversial nodes at different taxonomic levels. Gene trees inferred with plastid data often disagree with species trees inferred using both ML (with unpartitioned or partitioned data) and MSC. Species trees inferred with both methods produced alternative topologies for a few taxa. Our results show that, in a phylogenetic context, plastid protein-coding genes may not be fully linked and behaving as a single locus. Furthermore, concatenated matrices may produce highly supported phylogenies that are discordant with individual gene trees. We also show that phylogenies inferred with MSC are accurate. We therefore emphasize the importance of considering variation in phylogenetic signal across plastid genes and the exploration of plastome data to increase accuracy of estimating relationships. We also support the use of MSC with plastome matrices in future phylogenomic investigations.
被子植物的现行分类主要基于叶绿体标记的串联和最大似然法(ML)推断。这种方法的合理性假设是叶绿体 DNA(ptDNA)作为一个单一的基因座遗传,其各个基因产生一致的树。然而,ptDNA 的结构和功能特征表明,叶绿体基因可能不会作为一个单一的基因座进化,并且正在经历不同的进化力量。为了检验这个想法,我们生成了 27 个物种的新的完整叶绿体基因组(质体基因组)序列,并将这些数据与公开可用的序列相结合,生成了一个最终数据集,其中包括 89 种蔷薇目植物和 5 个外群的 78 个质体基因。我们使用四个数据矩阵(即基因、外显子、密码子对齐和氨基酸)使用 ML 和多物种合并(MSC)方法推断物种和基因树。蔷薇目植物包括所有被子植物的约三分之一,它们的两个主要分支,茄目和锦葵目,几乎在所有分析中都得到了恢复。然而,我们检测到不同矩阵和方法推断的物种树之间以及先前发表的质体和核系统发育之间的不一致。我们可视化并测试了基因树与物种树之间不一致的显著性。然后,我们测量了支持五个有争议的节点在不同分类水平上的替代位置的跨站点和基因的系统发育信号的分布。使用质体数据推断的基因树经常与使用 ML(使用不分段或分段数据)和 MSC 推断的物种树不一致。两种方法推断的物种树为一些分类群产生了替代的拓扑结构。我们的结果表明,在系统发育背景下,质体蛋白编码基因可能不完全连接,并且表现为单一的基因座。此外,串联矩阵可能产生高度支持的系统发育,与个别基因树不一致。我们还表明,使用 MSC 推断的系统发育是准确的。因此,我们强调考虑质体基因中系统发育信号的变异性以及探索质体数据以提高估计关系准确性的重要性。我们还支持在未来的基因组学研究中使用 MSC 和质体矩阵。