Dupuis Julian R, Bremer Forest T, Kauwe Angela, San Jose Michael, Leblanc Luc, Rubinoff Daniel, Geib Scott M
U.S. Department of Agriculture-Agricultural Research Service, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, Hilo, Hawaii.
Department of Plant and Environmental Protection Services, University of Hawaii at Manoa, Honolulu, Hawaii.
Mol Ecol Resour. 2018 Apr 6. doi: 10.1111/1755-0998.12783.
High-throughput sequencing has fundamentally changed how molecular phylogenetic data sets are assembled, and phylogenomic data sets commonly contain 50- to 100-fold more loci than those generated using traditional Sanger sequencing-based approaches. Here, we demonstrate a new approach for building phylogenomic data sets using single-tube, highly multiplexed amplicon sequencing, which we name HiMAP (highly multiplexed amplicon-based phylogenomics) and present bioinformatic pipelines for locus selection based on genomic and transcriptomic data resources and postsequencing consensus calling and alignment. This method is inexpensive and amenable to sequencing a large number (hundreds) of taxa simultaneously and requires minimal hands-on time at the bench (<1/2 day), and data analysis can be accomplished without the need for read mapping or assembly. We demonstrate this approach by sequencing 878 amplicons in single reactions for 82 species of tephritid fruit flies across seven genera (384 individuals), including some of the most economically important agricultural insect pests. The resulting filtered data set (>150,000-bp concatenated alignment, ~20% missing character sites across all individuals and amplicons) contained >40,000 phylogenetically informative characters, and although some discordance was observed between analyses, it provided unparalleled resolution of many phylogenetic relationships in this group. Most notably, we found high support for the generic status of Zeugodacus and the sister relationship between Dacus and Zeugodacus. We discuss HiMAP, with regard to its molecular and bioinformatic strengths, and the insight the resulting data set provides into relationships of this diverse insect group.
高通量测序从根本上改变了分子系统发育数据集的组装方式,而且系统发育基因组数据集通常比使用传统基于桑格测序的方法生成的数据集包含多50至100倍的基因座。在这里,我们展示了一种使用单管、高度多重扩增子测序构建系统发育基因组数据集的新方法,我们将其命名为HiMAP(基于高度多重扩增子的系统发育基因组学),并提出了基于基因组和转录组数据资源进行基因座选择以及测序后一致性调用和比对的生物信息学流程。这种方法成本低廉,适合同时对大量(数百个)分类单元进行测序,并且在实验台上所需的实际操作时间最少(不到半天),而且无需进行读段比对或组装即可完成数据分析。我们通过对七个属的82种实蝇科果蝇(384个个体)在单反应中对878个扩增子进行测序来展示这种方法,其中包括一些经济上最重要的农业害虫。得到的经过筛选的数据集(>150,000碱基对的串联比对,所有个体和扩增子中约20%的缺失字符位点)包含>40,000个系统发育信息特征,尽管在分析之间观察到了一些不一致,但它为该类群中许多系统发育关系提供了无与伦比的分辨率。最值得注意的是,我们发现对果实蝇属的属级地位以及寡鬃实蝇属和果实蝇属之间的姐妹关系有很高的支持度。我们讨论了HiMAP在分子和生物信息学方面的优势,以及由此产生的数据集为这个多样化昆虫类群的关系提供的见解。