RADseq 由于非随机单倍型采样而低估了多样性并引入了系统发育偏差。

RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling.

机构信息

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.

出版信息

Mol Ecol. 2013 Jun;22(11):3179-90. doi: 10.1111/mec.12276. Epub 2013 Apr 3.

Abstract

Reduced representation genome-sequencing approaches based on restriction digestion are enabling large-scale marker generation and facilitating genomic studies in a wide range of model and nonmodel systems. However, sampling chromosomes based on restriction digestion may introduce a bias in allele frequency estimation due to polymorphisms in restriction sites. To explore the effects of this nonrandom sampling and its sensitivity to different evolutionary parameters, we developed a coalescent-simulation framework to mimic the biased recovery of chromosomes in restriction-based short-read sequencing experiments (RADseq). We analysed simulated DNA sequence datasets and compared known values from simulations with those that would be estimated using a RADseq approach from the same samples. We compare these 'true' and 'estimated' values of commonly used summary statistics, π, θ(w), Tajima's D and F(ST). We show that loci with missing haplotypes have estimated summary statistic values that can deviate dramatically from true values and are also enriched for particular genealogical histories. These biases are sensitive to nonequilibrium demography, such as bottlenecks and population expansion. In silico digests with 102 completely sequenced Drosophila melanogaster genomes yielded results similar to our findings from coalescent simulations. Though the potential of RADseq for marker discovery and trait mapping in nonmodel systems remains undisputed, our results urge caution when applying this technique to make population genetic inferences.

摘要

基于限制性消化的简化基因组测序方法正在大规模生成标记，并促进广泛的模型和非模型系统的基因组研究。然而，基于限制性消化的染色体采样可能会由于限制性位点的多态性而导致等位基因频率估计产生偏差。为了探索这种非随机采样的影响及其对不同进化参数的敏感性，我们开发了一个合并模拟框架，以模拟基于限制性短读测序实验（RADseq）中染色体的偏置恢复。我们分析了模拟 DNA 序列数据集，并将模拟中的已知值与使用相同样本的 RADseq 方法估计的值进行了比较。我们比较了常用汇总统计量，π，θ(w)，Tajima 的 D 和 F(ST)的这些“真实”和“估计”值。我们表明，具有缺失单倍型的基因座的估计汇总统计量值可能与真实值有很大偏差，并且也富含特定的系统发育历史。这些偏差对非平衡人口统计学（如瓶颈和种群扩张）敏感。用 102 个完全测序的黑腹果蝇基因组进行的计算机消化产生的结果与我们从合并模拟中得出的结果相似。尽管 RADseq 在非模型系统中的标记发现和性状映射方面具有潜力，但我们的结果在应用该技术进行种群遗传推断时应谨慎。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

RADseq 由于非随机单倍型采样而低估了多样性并引入了系统发育偏差。

RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling.

机构信息

出版信息

相似文献

引用本文的文献

RADseq 由于非随机单倍型采样而低估了多样性并引入了系统发育偏差。

RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling.

机构信息

出版信息

相似文献

引用本文的文献