School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6BX, United Kingdom.
Genetics. 2010 Jun;185(2):587-602. doi: 10.1534/genetics.109.112391. Epub 2010 Apr 9.
We address the problem of finding evidence of natural selection from genetic data, accounting for the confounding effects of demographic history. In the absence of natural selection, gene genealogies should all be sampled from the same underlying distribution, often approximated by a coalescent model. Selection at a particular locus will lead to a modified genealogy, and this motivates a number of recent approaches for detecting the effects of natural selection in the genome as "outliers" under some models. The demographic history of a population affects the sampling distribution of genealogies, and therefore the observed genotypes and the classification of outliers. Since we cannot see genealogies directly, we have to infer them from the observed data under some model of mutation and demography. Thus the accuracy of an outlier-based approach depends to a greater or a lesser extent on the uncertainty about the demographic and mutational model. A natural modeling framework for this type of problem is provided by Bayesian hierarchical models, in which parameters, such as mutation rates and selection coefficients, are allowed to vary across loci. It has proved quite difficult computationally to implement fully probabilistic genealogical models with complex demographies, and this has motivated the development of approximations such as approximate Bayesian computation (ABC). In ABC the data are compressed into summary statistics, and computation of the likelihood function is replaced by simulation of data under the model. In a hierarchical setting one may be interested both in hyperparameters and parameters, and there may be very many of the latter--for example, in a genetic model, these may be parameters describing each of many loci or populations. This poses a problem for ABC in that one then requires summary statistics for each locus, which, if used naively, leads to a consequent difficulty in conditional density estimation. We develop a general method for applying ABC to Bayesian hierarchical models, and we apply it to detect microsatellite loci influenced by local selection. We demonstrate using receiver operating characteristic (ROC) analysis that this approach has comparable performance to a full-likelihood method and outperforms it when mutation rates are variable across loci.
我们解决了从遗传数据中寻找自然选择证据的问题,同时考虑了人口历史的混杂效应。在没有自然选择的情况下,基因谱系应该都来自于相同的基础分布,通常用合并模型来近似。在特定位置的选择会导致基因谱系的改变,这就促使了许多最近的方法来检测基因组中自然选择的影响,将其作为某些模型下的“异常值”。群体的人口历史会影响谱系的抽样分布,从而影响观察到的基因型和异常值的分类。由于我们不能直接看到谱系,所以我们必须根据某种突变和人口模型从观察到的数据中推断它们。因此,基于异常值的方法的准确性在一定程度上取决于人口和突变模型的不确定性。贝叶斯分层模型为这类问题提供了一个自然的建模框架,其中参数(如突变率和选择系数)可以在不同的位置上变化。在具有复杂人口统计学的情况下,实现完全概率性的谱系模型在计算上被证明是相当困难的,这促使了近似方法(如近似贝叶斯计算(ABC))的发展。在 ABC 中,数据被压缩成摘要统计数据,并且计算似然函数被模拟数据所取代。在分层设置中,人们可能对超参数和参数都感兴趣,并且后者可能非常多 - 例如,在遗传模型中,这些可能是描述许多位置或群体的参数。这对 ABC 提出了一个问题,因为它需要每个位置的摘要统计数据,如果使用不当,会导致条件密度估计的困难。我们开发了一种将 ABC 应用于贝叶斯分层模型的一般方法,并将其应用于检测受局部选择影响的微卫星位点。我们通过接收者操作特征(ROC)分析表明,这种方法的性能与全似然方法相当,并且在突变率在位置间变化时,它的性能优于全似然方法。