Breinholt Jesse W, Earl Chandra, Lemmon Alan R, Lemmon Emily Moriarty, Xiao Lei, Kawahara Akito Y
Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA.
RAPiD Genomics, Gainesville, FL 32601, USA.
Syst Biol. 2018 Jan 1;67(1):78-93. doi: 10.1093/sysbio/syx048.
The advent of next-generation sequencing technology has allowed for thecollection of large portions of the genome for phylogenetic analysis. Hybrid enrichment and transcriptomics are two techniques that leverage next-generation sequencing and have shown much promise. However, methods for processing hybrid enrichment data are still limited. We developed a pipeline for anchored hybrid enrichment (AHE) read assembly, orthology determination, contamination screening, and data processing for sequences flanking the target "probe" region. We apply this approach to study the phylogeny of butterflies and moths (Lepidoptera), a megadiverse group of more than 157,000 described species with poorly understood deep-level phylogenetic relationships. We introduce a new, 855 locus AHE kit for Lepidoptera phylogenetics and compare resulting trees to those from transcriptomes. The enrichment kit was designed from existing genomes, transcriptomes, and expressed sequence tags and was used to capture sequence data from 54 species from 23 lepidopteran families. Phylogenies estimated from AHE data were largely congruent with trees generated from transcriptomes, with strong support for relationships at all but the deepest taxonomic levels. We combine AHE and transcriptomic data to generate a new Lepidoptera phylogeny, representing 76 exemplar species in 42 families. The tree provides robust support for many relationships, including those among the seven butterfly families. The addition of AHE data to an existing transcriptomic dataset lowers node support along the Lepidoptera backbone, but firmly places taxa with AHE data on the phylogeny. Combining taxa sequenced for AHE with existing transcriptomes and genomes resulted in a tree with strong support for (Calliduloidea $+$ Gelechioidea $+$ Thyridoidea) $+$ (Papilionoidea $+$ Pyraloidea $+$ Macroheterocera). To examine the efficacy of AHE at a shallow taxonomic level, phylogenetic analyses were also conducted on a sister group representing a more recent divergence, the Saturniidae and Sphingidae. These analyses utilized sequences from the probe region and data flanking it, nearly doubled the size of the dataset; resulting trees supported new phylogenetics relationships, especially within the Saturniidae and Sphingidae (e.g., Hemarina derived in the latter). We hope that our data processing pipeline, hybrid enrichment gene set, and approach of combining AHE data with transcriptomes will be useful for the broader systematics community.
新一代测序技术的出现使得能够收集基因组的大部分用于系统发育分析。杂交富集和转录组学是利用新一代测序的两种技术,已显示出很大的前景。然而,处理杂交富集数据的方法仍然有限。我们开发了一个流程,用于锚定杂交富集(AHE)读数组装、直系同源性确定、污染筛选以及对目标“探针”区域侧翼序列的数据处理。我们应用这种方法来研究蝴蝶和蛾类(鳞翅目)的系统发育,鳞翅目是一个拥有超过15.7万个已描述物种的超多样化类群,其深层次的系统发育关系了解甚少。我们为鳞翅目系统发育学引入了一种新的、包含855个位点的AHE试剂盒,并将所得树与转录组的树进行比较。该富集试剂盒是根据现有的基因组、转录组和表达序列标签设计的,用于从23个鳞翅目科的54个物种中捕获序列数据。从AHE数据估计的系统发育在很大程度上与从转录组生成的树一致,除了最深的分类水平外,对所有关系都有很强的支持。我们将AHE和转录组数据结合起来生成一个新的鳞翅目系统发育树,代表42个科中的76个示例物种。该树为许多关系提供了有力支持,包括七个蝴蝶科之间的关系。将AHE数据添加到现有的转录组数据集中会降低鳞翅目主干上的节点支持,但能将有AHE数据的分类单元稳固地置于系统发育树上。将为AHE测序的分类单元与现有的转录组和基因组相结合,得到了一棵对(Calliduloidea + Gelechioidea + Thyridoidea)+(Papilionoidea + Pyraloidea + Macroheterocera)有强烈支持的树。为了在浅分类水平上检验AHE的功效,还对代表更近分歧的一个姐妹类群,即天蚕蛾科和天蛾科进行了系统发育分析。这些分析利用了来自探针区域及其侧翼的数据序列,使数据集大小几乎翻倍;所得树支持了新的系统发育关系,特别是在天蚕蛾科和天蛾科内(例如,Hemarina衍生自后者)。我们希望我们的数据处理流程、杂交富集基因集以及将AHE数据与转录组相结合的方法将对更广泛的系统分类学界有用。