Therkildsen Nina Overgaard, Palumbi Stephen R
Hopkins Marine Station, Department of Biology, Stanford University, 120 Oceanview Blvd., Pacific Grove, CA, 93950, USA.
Mol Ecol Resour. 2017 Mar;17(2):194-208. doi: 10.1111/1755-0998.12593. Epub 2016 Aug 29.
Today most population genomic studies of nonmodel organisms either sequence a subset of the genome deeply in each individual or sequence pools of unlabelled individuals. With a step-by-step workflow, we illustrate how low-coverage whole-genome sequencing of hundreds of individually barcoded samples is now a practical alternative strategy for obtaining genomewide data on a population scale. We used a highly efficient protocol to generate high-quality libraries for ~6.5 USD from each of 876 Atlantic silversides (a teleost fish with a genome size ~730 Mb) that we sequenced to 1-4× genome coverage. In the absence of a reference genome, we developed a bioinformatic pipeline for mapping the genomic reads to a de novo assembled reference transcriptome. This provides an 'in silico' method for exome capture that avoids the complexities and expenses of using wet chemistry for target isolation. Using novel tools for analysis of low-coverage data, we extracted population allele frequencies, individual genotype likelihoods and polymorphism data for 2 504 335 SNPs across the exome for the 876 fish. To illustrate the use of the resulting data, we present a preliminary analysis of geographical patterns in the exome data and a comparison of complete mitochondrial genome sequences for each individual (constructed from the low-coverage data) that show population colonization patterns along the US east coast. With a total cost per sample of less than 50 USD (including sequencing) and ability to prepare 96 libraries in only 5 h, our approach adds a viable new option to the population genomics toolbox.
如今,大多数针对非模式生物的群体基因组研究要么对每个个体的基因组子集进行深度测序,要么对未标记个体的样本池进行测序。通过一个循序渐进的工作流程,我们展示了对数百个带有个体条形码的样本进行低覆盖度全基因组测序,如今已成为在群体规模上获取全基因组数据的一种切实可行的替代策略。我们使用了一种高效方案,以每个约6.5美元的成本为876条大西洋银汉鱼(一种基因组大小约为730 Mb的硬骨鱼)生成高质量文库,并对其进行了1至4倍基因组覆盖度的测序。在没有参考基因组的情况下,我们开发了一种生物信息学流程,用于将基因组读数映射到从头组装的参考转录组。这提供了一种“虚拟”外显子捕获方法,避免了使用湿化学方法进行目标分离的复杂性和成本。我们使用新型工具分析低覆盖度数据,提取了876条鱼外显子中2504335个单核苷酸多态性(SNP)的群体等位基因频率、个体基因型似然值和多态性数据。为了说明所得数据的用途,我们对外显子数据中的地理模式进行了初步分析,并比较了每个个体的完整线粒体基因组序列(由低覆盖度数据构建),这些结果显示了美国东海岸沿线的群体定殖模式。我们的方法每个样本总成本不到50美元(包括测序),并且仅需5小时就能制备96个文库,为群体基因组学工具箱增添了一个可行的新选项。