Department of Biology, Pennsylvania State University, University Park, PA, USA.
Evolution, Ecology, and Organismal Biology Department, University of California Riverside, Riverside, CA, USA.
Mol Ecol. 2021 Dec;30(23):5994-6005. doi: 10.1111/mec.15940. Epub 2021 May 19.
Researchers seeking to generate genomic data for non-model organisms are faced with a number of trade-offs when deciding which method to use. The selection of reduced representation approaches versus whole genome resequencing will ultimately affect the marker density, sequencing depth, and the number of individuals that can multiplexed. These factors can affect researchers' ability to accurately characterize certain genomic features, such as landscapes of divergence-how F varies across the genomes. To provide insight into the effect of sequencing method on the estimation of divergence landscapes, we applied an identical bioinformatic pipeline to three generations of sequencing data (GBS, ddRAD, and WGS) produced for the same system, the yellow-rumped warbler species complex. We compare divergence landscapes generated using each method for the myrtle warbler (Setophaga coronata coronata) and the Audubon's warbler (S. c. auduboni), and for Audubon's warblers with deeply divergent mtDNA resulting from mitochondrial introgression. We found that most high-F peaks were not detected in the ddRAD data set, and that while both GBS and WGS were able to identify the presence of large peaks, WGS was superior at a finer scale. Comparing Audubon's warblers with divergent mitochondrial haplotypes, only WGS allowed us to identify small (10-20 kb) regions of elevated differentiation, one of which contained the nuclear-encoded mitochondrial gene NDUFAF3. We calculated the cost per base pair for each method and found it was comparable between GBS and WGS, but significantly higher for ddRAD. These comparisons highlight the advantages of WGS over reduced representation methods when characterizing landscapes of divergence.
研究人员在为非模式生物生成基因组数据时,在选择减少代表性的方法与全基因组重测序时,需要考虑许多权衡因素。这些因素会影响研究人员准确描述某些基因组特征的能力,例如分歧的景观——F 在基因组中的变化情况。为了深入了解测序方法对分歧景观估计的影响,我们应用相同的生物信息学管道,对同一系统(黄腹柳莺种复合体)产生的三代测序数据(GBS、ddRAD 和 WGS)进行分析。我们比较了使用每种方法生成的斑胸柳莺(Setophaga coronata coronata)和Audubon 柳莺(S. c. auduboni)的分歧景观,以及由于线粒体渗入而具有高度分化 mtDNA 的 Audubon 柳莺的分歧景观。我们发现,ddRAD 数据集未检测到大多数高 F 峰,而 GBS 和 WGS 都能够识别大峰的存在,但 WGS 在更精细的尺度上更具优势。比较具有分化线粒体单倍型的 Audubon 柳莺,只有 WGS 能够识别出高度分化的小(10-20kb)区域,其中一个区域包含核编码的线粒体基因 NDUFAF3。我们计算了每种方法的每碱基对成本,发现 GBS 和 WGS 之间相当,但 ddRAD 显著更高。这些比较强调了 WGS 在描述分歧景观方面相对于减少代表性方法的优势。