Chen Hua, Hey Jody, Slatkin Montgomery
Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Center for Computational Genetics and Genomics, Temple University, Philadelphia PA 19122, United States.
Center for Computational Genetics and Genomics, Temple University, Philadelphia PA 19122, United States.
Theor Popul Biol. 2015 Feb;99:18-30. doi: 10.1016/j.tpb.2014.11.001. Epub 2014 Nov 13.
Recent positive selection can increase the frequency of an advantageous mutant rapidly enough that a relatively long ancestral haplotype will be remained intact around it. We present a hidden Markov model (HMM) to identify such haplotype structures. With HMM identified haplotype structures, a population genetic model for the extent of ancestral haplotypes is then adopted for parameter inference of the selection intensity and the allele age. Simulations show that this method can detect selection under a wide range of conditions and has higher power than the existing frequency spectrum-based method. In addition, it provides good estimate of the selection coefficients and allele ages for strong selection. The method analyzes large data sets in a reasonable amount of running time. This method is applied to HapMap III data for a genome scan, and identifies a list of candidate regions putatively under recent positive selection. It is also applied to several genes known to be under recent positive selection, including the LCT, KITLG and TYRP1 genes in Northern Europeans, and OCA2 in East Asians, to estimate their allele ages and selection coefficients.
近期的正选择能够足够迅速地增加有利突变的频率,以至于围绕该突变的相对较长的祖先单倍型将保持完整。我们提出了一种隐马尔可夫模型(HMM)来识别这种单倍型结构。利用HMM识别出的单倍型结构,随后采用一个关于祖先单倍型范围的群体遗传模型来推断选择强度和等位基因年龄的参数。模拟结果表明,该方法能够在广泛的条件下检测到选择,并且比现有的基于频率谱的方法具有更高的效能。此外,对于强选择情况,它能很好地估计选择系数和等位基因年龄。该方法能在合理的运行时间内分析大数据集。此方法应用于HapMap III数据进行全基因组扫描,并识别出一系列可能处于近期正选择下的候选区域。它还应用于几个已知处于近期正选择下的基因,包括北欧人中的LCT、KITLG和TYRP1基因,以及东亚人中的OCA2基因,以估计它们的等位基因年龄和选择系数。