Department of Biology, Stanford University, Stanford CA 94305,
Department of Biology, Stanford University, Stanford CA 94305.
G3 (Bethesda). 2019 Dec 3;9(12):4159-4168. doi: 10.1534/g3.119.400755.
Evolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of , we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.
进化和测序(E+R)实验利用下一代测序技术来跟踪种群在进化过程中的等位基因频率动态。虽然以前的工作表明,通过比较许多重复种群的频率轨迹可以检测到适应性等位基因,但这种方法的代价是需要对许多 pooled 样本进行高覆盖率(>100x)测序,这可能会非常昂贵。在这里,我们通过在小基因组窗口中推断已知的起始单倍型,展示了通过非常浅的测序深度(<5x)可以准确估计等位基因频率。该技术可用于在任何模型生物的种群中高效估计任何数量的双等位基因 SNPs 的频率,这些种群是由测序的纯合系建立的。使用实验汇集的和模拟的 样本,我们表明,单倍型推断可以将重组多达 50 代的等位基因频率准确性提高几个数量级,并且对中等程度的缺失数据以及不同的选择模式都具有鲁棒性。最后,我们表明,这些模拟产生的简单线性模型可以预测haplotype-derived allele frequencies 在其他模型生物和实验设计中的准确性。为了使这些结果广泛应用于 E+R 实验,我们引入了 HAF-pipe,这是一种用于从原始测序数据计算单倍型衍生等位基因频率的开源软件工具。最终,通过降低测序成本而不牺牲准确性,我们的方法促进了具有更高复制和分辨率的 E+R 设计,从而提高了检测适应性等位基因的能力。