基于低覆盖度测序的群体基因组学：我们应该低到什么程度？

Research in molecular ecology is now often based on large numbers of DNA sequence reads. Given a time and financial budget for DNA sequencing, the question arises as to how to allocate the finite number of sequence reads among three dimensions: (i) sequencing individual nucleotide positions repeatedly and achieving high confidence in the true genotype of individuals, (ii) sampling larger numbers of individuals from a population, and (iii) sampling a larger fraction of the genome. Leaving aside the question of what fraction of the genome to sample, we analyze the trade-off between repeatedly sequencing the same nucleotide position (coverage depth) and the number of individuals in the sample. We review simple Bayesian models for allele frequencies and utilize these in the analysis of how to obtain maximal information about population genetic parameters. The models indicate that sampling larger numbers of individuals, at the expense of coverage depth per nucleotide position, provides more information about population parameters. Dividing the sequencing effort maximally among individuals and obtaining approximately one read per locus and individual (1 × coverage) yields the most information about a population. Some analyses require genetic parameters for individuals, in which case Bayesian population models also support inference from lower coverage sequence data than are required for simple likelihood models. Low coverage sequencing is not only sufficient to support inference, but it is optimal to design studies to utilize low coverage because they will yield highly accurate and precise parameter estimates based on more individuals or sites in the genome.

分子生态学的研究现在通常基于大量的 DNA 序列读数。在给定 DNA 测序的时间和财务预算的情况下，就会出现如何在三个维度之间分配有限数量的序列读数的问题：（i）重复测序个体核苷酸位置，以实现个体真实基因型的高置信度，（ii）从种群中采样更多数量的个体，以及（iii）采样更大比例的基因组。在不考虑要采样基因组的分数的情况下，我们分析了在重复测序相同核苷酸位置（覆盖深度）和样本中的个体数量之间的权衡。我们回顾了等位基因频率的简单贝叶斯模型，并在分析如何获得有关群体遗传参数的最大信息时利用了这些模型。这些模型表明，以牺牲每个核苷酸位置的覆盖深度为代价，采样更多的个体可以提供有关群体参数的更多信息。在个体之间最大程度地分配测序工作，并在每个基因座和个体中获得大约一个读数（1× 覆盖），可以最大程度地了解一个群体。某些分析需要个体的遗传参数，在这种情况下，贝叶斯群体模型也支持从比简单似然模型所需的更低的覆盖序列数据中进行推断。低覆盖测序不仅足以支持推断，而且从设计研究利用低覆盖测序的角度来看也是最优的，因为它们将基于基因组中的更多个体或位点产生高度准确和精确的参数估计。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Population genomics based on low coverage sequencing: how low should we go?

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

推荐工具