Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia.
Department of Ecology, Environment and Evolution, La Trobe University, Melbourne, Vic., Australia.
Mol Ecol Resour. 2021 May;21(4):1118-1140. doi: 10.1111/1755-0998.13327. Epub 2021 Feb 16.
With over 25,000 species, the drivers of diversity in the Orchidaceae remain to be fully understood. Here, we outline a multitiered sequence capture strategy aimed at capturing hundreds of loci to enable phylogenetic resolution from subtribe to subspecific levels in orchids of the tribe Diurideae. For the probe design, we mined subsets of 18 transcriptomes, to give five target sequence sets aimed at the tribe (Sets 1 & 2), subtribe (Set 3), and within subtribe levels (Sets 4 & 5). Analysis included alternative de novo and reference-guided assembly, before target sequence extraction, annotation and alignment, and application of a homology-aware k-mer block phylogenomic approach, prior to maximum likelihood and coalescence-based phylogenetic inference. Our evaluation considered 87 taxa in two test data sets: 67 samples spanning the tribe, and 72 samples involving 24 closely related Caladenia species. The tiered design achieved high target loci recovery (>89%), with the median number of recovered loci in Sets 1-5 as follows: 212, 219, 816, 1024, and 1009, respectively. Interestingly, as a first test of the homologous k-mer approach for targeted sequence capture data, our study revealed its potential for enabling robust phylogenetic species tree inferences. Specifically, we found matching, and in one case improved phylogenetic resolution within species complexes, compared to conventional phylogenetic analysis involving target gene extraction. Our findings indicate that a customized multitiered sequence capture strategy, in combination with promising yet underutilized phylogenomic approaches, will be effective for groups where interspecific divergence is recent, but information on deeper phylogenetic relationships is also required.
兰花科有超过 25000 个物种,但其多样性的驱动因素仍未被完全理解。在这里,我们概述了一种多层次的序列捕获策略,旨在捕获数百个基因座,以实现对 Diurideae 族兰花从亚科到亚种水平的系统发育分辨率。对于探针设计,我们从 18 个转录组中挖掘了子集,以获得五个旨在针对该族(集 1 和 2)、亚科(集 3)和亚科内水平(集 4 和 5)的目标序列集。分析包括替代从头和参考指导组装,然后是目标序列提取、注释和对齐,以及同源性感知 k-mer 块基因组学方法的应用,然后是最大似然和基于合并的系统发育推断。我们的评估考虑了两个测试数据集的 87 个分类群:跨越该族的 67 个样本,以及涉及 24 个密切相关的 Caladenia 物种的 72 个样本。分层设计实现了高目标基因座回收率(>89%),在集 1-5 中恢复的中位基因座数如下:212、219、816、1024 和 1009。有趣的是,作为同源 k-mer 方法在靶向序列捕获数据中的首次测试,我们的研究表明它具有为强大的系统发育种树推断提供支持的潜力。具体来说,与涉及目标基因提取的常规系统发育分析相比,我们发现了种内复合体中匹配的,甚至在一个案例中改善了系统发育分辨率。我们的研究结果表明,定制的多层次序列捕获策略,结合有前途但未充分利用的基因组学方法,将对种间分化较近但也需要更深层系统发育关系信息的群体有效。