Iranmehr Arya, Akbari Ali, Schlötterer Christian, Bafna Vineet
Department of Electrical and Computer Engineering, University of California, San Diego, California 92093
Department of Electrical and Computer Engineering, University of California, San Diego, California 92093.
Genetics. 2017 Jun;206(2):1011-1023. doi: 10.1534/genetics.116.197566. Epub 2017 Apr 10.
The advent of next generation sequencing technologies has made whole-genome and whole-population sampling possible, even for eukaryotes with large genomes. With this development, experimental evolution studies can be designed to observe molecular evolution "in action" via evolve-and-resequence (E&R) experiments. Among other applications, E&R studies can be used to locate the genes and variants responsible for genetic adaptation. Most existing literature on time-series data analysis often assumes large population size, accurate allele frequency estimates, or wide time spans. These assumptions do not hold in many E&R studies. In this article, we propose a method-composition of likelihoods for evolve-and-resequence experiments (Clear)-to identify signatures of selection in small population E&R experiments. Clear takes whole-genome sequences of pools of individuals as input, and properly addresses heterogeneous ascertainment bias resulting from uneven coverage. Clear also provides unbiased estimates of model parameters, including population size, selection strength, and dominance, while being computationally efficient. Extensive simulations show that Clear achieves higher power in detecting and localizing selection over a wide range of parameters, and is robust to variation of coverage. We applied the Clear statistic to multiple E&R experiments, including data from a study of adaptation of to alternating temperatures and a study of outcrossing yeast populations, and identified multiple regions under selection with genome-wide significance.
下一代测序技术的出现使得全基因组和全群体采样成为可能,即使对于具有大基因组的真核生物也是如此。随着这一发展,实验进化研究可以通过进化与重测序(E&R)实验来设计,以观察“实际发生的”分子进化。在其他应用中,E&R研究可用于定位负责遗传适应的基因和变异。大多数关于时间序列数据分析的现有文献通常假设群体规模大、等位基因频率估计准确或时间跨度宽。这些假设在许多E&R研究中并不成立。在本文中,我们提出了一种用于进化与重测序实验的似然性组合方法(Clear),以识别小群体E&R实验中的选择特征。Clear将个体池的全基因组序列作为输入,并妥善处理因覆盖不均导致的异质性确定偏差。Clear还提供了模型参数的无偏估计,包括群体大小、选择强度和显性,同时计算效率高。广泛的模拟表明,Clear在检测和定位广泛参数范围内的选择方面具有更高的功效,并且对覆盖变化具有鲁棒性。我们将Clear统计量应用于多个E&R实验,包括一项关于适应交替温度的研究以及一项关于杂交酵母群体的研究数据,并确定了多个具有全基因组显著性的选择区域。