Duforet-Frebourg Nicolas, Bazin Eric, Blum Michael G B
Laboratoire TIMC-IMAG, UMR 5525, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France.
Laboratoire d'Ecologie Alpine, UMR 5553, Centre National de la Recherche Scientifique, Université Joseph Fourier, Grenoble, France.
Mol Biol Evol. 2014 Sep;31(9):2483-95. doi: 10.1093/molbev/msu182. Epub 2014 Jun 3.
There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as FST. However, there are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an FST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software.
在群体基因组学中,确定参与局部适应性的基因座有着相当大的推动力。一种寻找受局部适应性影响的基因组区域的有效方法是对众多分子标记进行基因分型,并寻找异常基因座。选择扫描最常用的方法之一是基于测量群体分化的统计量,如FST。然而,与FST相关的方法存在重要的注意事项,因为它们需要将个体分组到群体中,并且还假设了一种特定的群体结构模型。在这里,我们基于贝叶斯因子模型实现了一种更灵活的基于个体的方法。因子模型通过称为因子的潜在变量来捕捉群体结构,这些因子可以描述个体聚类到群体中的情况或距离隔离模式。使用层次贝叶斯建模,我们既推断群体结构,又识别作为局部适应性候选的异常基因座。为了识别异常基因座,层次因子模型会搜索与由潜在因子测量的群体结构非典型相关的基因座。在群体分化模型中,我们表明与软件BayeScan或FST方法相比,它可以将错误发现率降低2倍或更多。我们表明我们的软件可以通过分析人类基因组多样性计划的单核苷酸多态性来处理大型数据集。贝叶斯因子模型在开源软件PCAdapt中实现。