Department of Integrative Biology, University of California, Valley Life Sciences Building, Berkeley, CA, 94720, USA.
Department of Statistics, University of California, Evans Hall, Berkeley, CA, 94720, USA.
J Math Biol. 2024 Oct 4;89(5):47. doi: 10.1007/s00285-024-02146-0.
Estimation of admixture proportions has become one of the most commonly used computational tools in population genomics. However, there is remarkably little population genetic theory on statistical properties of these variables. We develop theoretical results that can accurately predict means and variances of admixture proportions within a population using models with recombination and genetic drift. Based on established theory on measures of multilocus disequilibrium, we show that there is a set of recurrence relations that can be used to derive expectations for higher moments of the admixture proportions distribution. We obtain closed form solutions for some special cases. Using these results, we develop a method for estimating admixture parameters from estimated admixture proportions obtained from programs such as Structure or Admixture. We apply this method to HapMap 3 data and find that the population history of African Americans, as expected, is not best explained by a single admixture event between people of European and African ancestry. The model of constant gene flow starting at 8 generations and ending at 2 generations before present gives the best fit.
混合比例的估计已成为群体基因组学中最常用的计算工具之一。然而,关于这些变量的统计性质的群体遗传理论却少得惊人。我们开发了理论结果,可以使用具有重组和遗传漂变的模型,准确预测群体内混合比例的均值和方差。基于已建立的多基因座不平衡度量理论,我们表明存在一组递归关系,可用于推导出混合比例分布的更高阶矩的期望。我们获得了一些特殊情况的封闭形式解。利用这些结果,我们开发了一种从 Structure 或 Admixture 等程序中获得的估计混合比例来估计混合参数的方法。我们将该方法应用于 HapMap 3 数据,发现非洲裔美国人的种群历史,如预期的那样,不能仅用来自欧洲和非洲祖先的人群之间的单一混合事件来最好地解释。从现在起 8 代和结束前 2 代的恒定基因流动模型给出了最佳拟合。