Department of Ecology and Evolution, University of Chicago, Chicago, IL.
Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, Davis, CA.
Mol Biol Evol. 2018 Apr 1;35(4):1003-1017. doi: 10.1093/molbev/msy006.
The haplotypes of a beneficial allele carry information about its history that can shed light on its age and the putative cause for its increase in frequency. Specifically, the signature of an allele's age is contained in the pattern of variation that mutation and recombination impose on its haplotypic background. We provide a method to exploit this pattern and infer the time to the common ancestor of a positively selected allele following a rapid increase in frequency. We do so using a hidden Markov model which leverages the length distribution of the shared ancestral haplotype, the accumulation of derived mutations on the ancestral background, and the surrounding background haplotype diversity. Using simulations, we demonstrate how the inclusion of information from both mutation and recombination events increases accuracy relative to approaches that only consider a single type of event. We also show the behavior of the estimator in cases where data do not conform to model assumptions, and provide some diagnostics for assessing and improving inference. Using the method, we analyze population-specific patterns in the 1000 Genomes Project data to estimate the timing of adaptation for several variants which show evidence of recent selection and functional relevance to diet, skin pigmentation, and morphology in humans.
有利等位基因的单倍型携带有关其历史的信息,这些信息可以揭示其年龄以及其频率增加的可能原因。具体来说,等位基因年龄的特征包含在突变和重组对其单倍型背景施加的变异模式中。我们提供了一种利用这种模式并推断在频率快速增加后,被正选择的等位基因的共同祖先的时间的方法。我们通过使用隐马尔可夫模型来实现这一点,该模型利用了共享祖先单倍型的长度分布、在祖先背景上积累的衍生突变以及周围背景单倍型多样性。通过模拟,我们展示了与仅考虑单一类型事件的方法相比,包含突变和重组事件信息如何提高准确性。我们还展示了在数据不符合模型假设的情况下,估计器的行为,并提供了一些用于评估和改进推断的诊断方法。我们使用该方法分析了 1000 基因组计划数据中的人群特异性模式,以估计在人类饮食、皮肤色素沉着和形态方面具有近期选择和功能相关性的几个变体的适应时间。