Department of Biological Sciences, Smith College, Northampton, MA 01063, USA and Program in Organismic and Evolutionary Biology, UMass-Amherst, Amherst MA 01003, USA Department of Biological Sciences, Smith College, Northampton, MA 01063, USA and Program in Organismic and Evolutionary Biology, UMass-Amherst, Amherst MA 01003, USA
Department of Biological Sciences, Smith College, Northampton, MA 01063, USA and Program in Organismic and Evolutionary Biology, UMass-Amherst, Amherst MA 01003, USA.
Syst Biol. 2015 May;64(3):406-15. doi: 10.1093/sysbio/syu126. Epub 2014 Dec 23.
Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the "dark area" of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages, we perform a taxon-rich phylogenomic analysis including 232 eukaryotes selected to maximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxon-rich estimates of the eukaryotic tree of life, and we recover strong support for five major clades: Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida, and SAR (Stramenopila, Alveolata, and Rhizaria). Our analyses also highlight the existence of "orphan" lineages, lineages that lack robust placement in the eukaryotic tree of life, and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that approximately 10% of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including genes of plastid origin, and combining them with genes of nuclear origin, may mislead phylogenetic estimates. Finally, the large size of our data set allows comparative analyses of subsets of data; alignments built from randomly sampled sites provide greater support, particularly for deep relationships, than do equivalent-sized data sets built from randomly sampled genes.
大多数真核生物是微生物,其中许多最近才被用于系统发育研究采样,或者仍处于生命之树的“黑暗区域”,那里没有分子数据。为了评估真核生物谱系之间的关系,我们进行了一项富含分类群的系统基因组分析,其中包括 232 种真核生物,这些生物是为了最大限度地提高分类多样性而选择的,多达 1554 个基因是根据它们在真核生物中的广泛分布选择的垂直遗传基因。我们还包括来自 486 种细菌和 84 种古菌的序列,以评估来自质体的内共生基因转移(EGT)的影响,并检测污染。总体而言,我们的分析与其他较少分类群的真核生物树的生命估计一致,我们强烈支持五个主要分支:变形虫,挖掘虫(不包括 Malawimonas 属),后生动物,古生菌和 SAR(Stramenopila、Alveolata 和 Rhizaria)。我们的分析还突出了“孤儿”谱系的存在,这些谱系在真核生物树的生命中缺乏稳健的位置,并表明可能存在尚未发现的多样性。在包括细菌和古菌的分析中,我们发现,我们选择的 1554 个基因中的大约 10%,因为它们存在于五个主要真核生物类群中的四个或五个中,因此可能更有可能垂直遗传,似乎是通过光合作用谱系中的 EGT 从蓝细菌中获得的。去除这些 EGT 基因将绿藻置于蓝藻而不是红藻的姐妹关系中,这表明无意中包括质体起源的基因,并将它们与核起源的基因结合起来,可能会误导系统发育估计。最后,我们数据集的大小允许对数据集的子集进行比较分析;从随机采样位点构建的比对提供了更大的支持,特别是对于深关系,比从随机采样基因构建的等效大小数据集提供了更大的支持。