Inra, UMR CBGP (INRA - IRD - Cirad - Montpellier SupAgro), Campus international de Baillarguet, CS 30016, F-34988, Montferrier-sur-Lez, France.
Mol Ecol. 2013 Jun;22(11):3165-78. doi: 10.1111/mec.12089. Epub 2012 Oct 30.
Inexpensive short-read sequencing technologies applied to reduced representation genomes is revolutionizing genetic research, especially population genetics analysis, by allowing the genotyping of massive numbers of single-nucleotide polymorphisms (SNP) for large numbers of individuals and populations. Restriction site-associated DNA (RAD) sequencing is a recent technique based on the characterization of genomic regions flanking restriction sites. One of its potential drawbacks is the presence of polymorphism within the restriction site, which makes it impossible to observe the associated SNP allele (i.e. allele dropout, ADO). To investigate the effect of ADO on genetic variation estimated from RAD markers, we first mathematically derived measures of the effect of ADO on allele frequencies as a function of different parameters within a single population. We then used RAD data sets simulated using a coalescence model to investigate the magnitude of biases induced by ADO on the estimation of expected heterozygosity and F(ST) under a simple demographic model of divergence between two populations. We found that ADO tends to overestimate genetic variation both within and between populations. Assuming a mutation rate per nucleotide between 10(-9) and 10(-8), this bias remained low for most studied combinations of divergence time and effective population size, except for large effective population sizes. Averaging F(ST) values over multiple SNPs, for example, by sliding window analysis, did not correct ADO biases. We briefly discuss possible solutions to filter the most problematic cases of ADO using read coverage to detect markers with a large excess of null alleles.
廉价的短读测序技术应用于简化基因组,正在彻底改变遗传研究,尤其是群体遗传学分析,因为它可以对大量个体和群体的大量单核苷酸多态性 (SNP) 进行基因分型。基于限制性位点相关 DNA (RAD) 测序是一种最近的技术,它基于对限制位点侧翼基因组区域的特征描述。其潜在的缺点之一是限制位点内存在多态性,这使得无法观察到相关的 SNP 等位基因(即等位基因缺失,ADO)。为了研究 ADO 对 RAD 标记估计的遗传变异的影响,我们首先从数学上推导出了 ADO 对等位基因频率的影响的度量,作为单个群体内不同参数的函数。然后,我们使用基于合并模型模拟的 RAD 数据集来研究 ADO 对预期杂合度和 F(ST) 的估计在两个群体之间的简单分歧的简单人口模型下产生的偏差的大小。我们发现 ADO 倾向于高估群体内和群体间的遗传变异。假设核苷酸之间的突变率在 10(-9) 到 10(-8) 之间,对于大多数研究的分歧时间和有效种群大小的组合,这种偏差都很低,除非有效种群大小很大。例如,通过滑动窗口分析对多个 SNP 的 F(ST) 值进行平均,不能纠正 ADO 偏差。我们简要讨论了使用读取覆盖率检测具有大量无效等位基因的标记来过滤 ADO 最严重情况的可能解决方案。