Vogl Claus
Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
Theor Popul Biol. 2014 Dec;98:19-27. doi: 10.1016/j.tpb.2014.10.002. Epub 2014 Oct 18.
The distribution of allele frequencies of a large number of biallelic sites is known as "allele-frequency spectrum" or "site-frequency spectrum" (SFS). Without selection and in regions of relatively high recombination rates, sites may be assumed to be independently and identically distributed. With a beta equilibrium distribution of allelic proportions and binomial sampling, a beta-binomial compound likelihood for each site results. The likelihood of the data and the posterior distribution of two parameters, scaled mutation rate θ and mutation bias α, is investigated in the general case and for small scaled mutation rates θ. In the general case, an expectation-maximization (EM) algorithm is derived to obtain maximum likelihood estimates of both parameters. With an appropriate prior distribution, a Markov chain Monte Carlo sampler to integrate the posterior distribution is also derived. As far as I am aware, previous maximum likelihood or Bayesian estimators of θ, explicitly or implicitly assume small scaled mutation rates, i.e., θ≪1. For θ≪1, maximum-likelihood estimators are also derived for both parameters using a Taylor series expansion of the beta-binomial distribution. The estimator of θ is a variant of the Ewens-Watterson estimator and of the maximum likelihood estimator derived with the Poisson Random Field approach. With a conjugate prior distribution, marginal and conditional beta posterior distributions are also derived for both parameters.
大量双等位基因位点的等位基因频率分布被称为“等位基因频率谱”或“位点频率谱”(SFS)。在没有选择且重组率相对较高的区域,可以假定位点是独立同分布的。在等位基因比例呈β平衡分布且进行二项式抽样的情况下,每个位点会产生一个β - 二项式复合似然。在一般情况下以及小尺度突变率θ的情况下,研究了数据的似然性以及两个参数(尺度化突变率θ和突变偏差α)的后验分布。在一般情况下,推导了一种期望最大化(EM)算法以获得两个参数的最大似然估计。在具有适当先验分布的情况下,还推导了一个用于对后验分布进行积分的马尔可夫链蒙特卡罗采样器。据我所知,之前对θ的最大似然或贝叶斯估计器,无论是显式还是隐式地,都假定了小尺度突变率,即θ≪1。对于θ≪1,还使用β - 二项式分布的泰勒级数展开为两个参数推导了最大似然估计器。θ的估计器是Ewens - Watterson估计器以及用泊松随机场方法推导的最大似然估计器的一个变体。在具有共轭先验分布的情况下,还为两个参数推导了边际和条件β后验分布。