Departamento de Botânica, Universidade de São Paulo, Rua do Matão 277, Cidade Universitária, CEP, 05508-900, São Paulo, São Paulo, Brazil.
Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
Mol Ecol Resour. 2017 Nov;17(6):1136-1147. doi: 10.1111/1755-0998.12654. Epub 2017 Feb 10.
High-throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double-digest restriction-associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra- and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates of F can be obtained with a large number of SNPs (≥1500). These results highlight the potential of high-throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.
高通量 DNA 测序技术可用于分析非模式生物的基因组大片段,确保种群遗传参数具有高度准确性。然而,目前仍缺乏评估此类研究中合适样本量的经验研究。在本研究中,我们使用双酶切相关 DNA 测序(ddRADseq)技术,从 Amphirrhox longifolia(堇菜科)的两个物理隔离种群中恢复了数千个单核苷酸多态性(SNP)。Amphirrhox longifolia 是非模式植物物种,没有参考基因组,我们使用重采样技术,构建了具有随机个体和 SNP 子集的模拟种群,以确定应该采集多少个体和双等位基因标记,以准确估计种群内和种群间的遗传多样性。我们分别为 A. longifolia 的两个种群鉴定出 3646 和 4900 个多态性 SNP。我们的模拟结果表明,总体而言,当使用 1000 个或更多 SNP 时,样本量大于 8 个个体对 A. longifolia 种群内遗传多样性的估计几乎没有影响。我们的结果还表明,即使在非常小的样本量(即 2 个个体)下,也可以通过大量 SNP(≥1500)获得准确的 F 估计值。这些结果突显了高通量基因组测序方法在解决非模式生物进化生物学相关问题方面的潜力。此外,我们的研究结果还为在群体基因组学时代优化采样策略提供了新的思路。