Department of Ecology and Evolutionary Biology, Brown University, Box G-W, Providence, RI 02912, USA.
Department of Microbiology and Plant Biology and Oklahoma Biological Survey, University of Oklahoma, 770 Van Vleet Oval, Norman, OK 73019, USA.
Syst Biol. 2018 May 1;67(3):367-383. doi: 10.1093/sysbio/syx078.
Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the "portullugo" (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C${\mathrm{4}}$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C${\mathrm{4}}$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$_{\mathrm{4}}$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $+$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.
混合富集是一种越来越流行的方法,可以快速、廉价地获取数百个用于系统发育分析的基因座,涵盖许多分类群。用于测序的目标基因通常是单拷贝基因座,这有助于更直接的序列组装和同源性分配过程。然而,这种方法限制了包含大多数功能相关基因的可能性,这些基因通常属于多基因家族。在这里,我们展示了在混合富集协议中包含大基因家族的可行性,用于系统发育重建和随后的分子进化分析,使用了一组新的针对“portullugo”(石竹目)的诱饵序列,这是一个中等大小的开花植物谱系(约 2200 种),包括仙人掌,并且包含许多向 C${\mathrm{4}}$和 CAM 光合作用的进化转变。包含多基因家族使我们能够同时推断出一个稳健的系统发育,并构建了一个主要的 C${\mathrm{4}}$和 CAM 光合作用酶的密集采样序列,这揭示了与 C$_{\mathrm{4}}$和 CAM 起源相关的适应性氨基酸取代的积累,特别是在特定的同源基因中。我们用于系统发育分析的最终矩阵集包括 74 个分类群的 75-218 个基因座,在数据集之间具有约 50%的矩阵完整性。在整个树中,浅层和深层水平的系统发育分辨率都得到了极大的提高。串联和基于合并的方法都强烈支持仙人掌的姐妹谱系:Anacampserotaceae+$Portulacaceae,这两个是温暖、干旱地区的小而多汁草本植物的主要谱系。尽管存在这种一致性,但 BUCKy 一致性分析表明,基因树之间存在强烈而冲突的信号。我们的结果增加了越来越多的例子,说明了基因组规模数据中系统发育信号的复杂性。