Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA.
Syst Biol. 2022 Feb 10;71(2):476-489. doi: 10.1093/sysbio/syab053.
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), that is, that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are nonrecombining in an historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored. [c-gene; coalescent gene; concatalescence; organelle genome; plastome; recombination; species tree.].
目前分子系统学实践中占主导地位的种系树范式是根据多物种合并(MSC)的假设,从序列集合中推断出种系树,也就是说,序列之间存在自由重组,而序列内部不存在(或很少存在)重组。这些合并基因(c-基因)因此是从历史意义上而不是分子意义上定义的,理论上可以像整个基因组一样大,也可以像单个核苷酸一样小。关于如何定义 c-基因的争论集中在这样一种论点上,即许多合并分析中使用的核基因序列经历了太多的重组,以至于它们的内含子包含多个 c-基因,违反了 MSC 的一个关键假设。最近,类似的论点也被提出用于质体(如叶绿体)和线粒体基因组的基因,因为在过去 30 多年中,为了进行系统发育重建,它们一直被认为代表一个单一的 c-基因,因为从历史意义上讲它们是非重组的。因此,有人建议,应该使用合并方法来分析这些基因组,将它们的基因——大多数质体基因组的基因超过 70 个蛋白编码基因——作为物种系统发育的独立估计,而不是通常用于生成基因树的串联方法。然而,尽管自 20 世纪 70 年代以来就已经认识到质体中确实会发生重组,但这种重组不太可能具有系统发育相关性。这是因为只有当具有不一致历史的质体被合并到同一个质体中时,这种历史上有效的重组才会发生。然而,质体迅速分拣到不同的细胞谱系中,很少融合。因此,由于质体生物学,质体比平均多内含子哺乳动物核基因更具有典型的 c-基因特征。因此,应该继续将质体基因组视为潜在物种系统发育的单一估计,线粒体基因组也是如此。本文探讨了这一分子系统学长期以来的认识对系统发育组学时代研究的影响。[c-基因;合并基因;串联;细胞器基因组;质体基因组;重组;种系树。]