Zhang Wei, Liu Jing, Goodman Jesse, Weir Bruce S, Fewster Rachel M
Department of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand.
Department of Statistics, University of Auckland, Private Bag 92019, Auckland, New Zealand; University of Michigan - Shanghai Jiao Tong University Joint Institute, Shanghai 200240, China.
Theor Popul Biol. 2019 Aug;128:19-26. doi: 10.1016/j.tpb.2019.05.002. Epub 2019 May 27.
The linkage disequilibrium coefficient r is a measure of statistical dependence of the alleles possessed by an individual at different genetic loci. It is widely used in association studies to search for the locations of disease-causing genes on chromosomes. Most studies to date treat r as a fixed property of two loci in a finite population, and investigate the sampling distribution of estimators due to the statistical sampling of individuals from the population. Here, we instead consider the distribution of r itself under a process of genetic sampling through the generations. Using a classical two-locus model for genetic drift, mutation, and recombination, we investigate the probability density function of r at stationarity. This density function provides a tool for inference on evolutionary parameters such as mutation and recombination rates. We reconstruct the approximate stationary density of r by calculating a finite sequence of the distribution's moments and applying the maximum entropy principle. Our approach is based on the diffusion approximation, under which we demonstrate that for certain models in population genetics, moments of the stationary distribution can be obtained without knowing the probability distribution itself. To illustrate our approach, we show how the stationary probability density of r can be used in a maximum likelihood framework to estimate mutation and recombination rates from sample data of r.
连锁不平衡系数r是衡量个体在不同基因座上所拥有的等位基因之间统计依赖性的指标。它在关联研究中被广泛用于寻找染色体上致病基因的位置。迄今为止,大多数研究将r视为有限群体中两个基因座的固定属性,并研究由于从群体中对个体进行统计抽样而导致的估计量的抽样分布。在这里,我们转而考虑在世代遗传抽样过程中r本身的分布。使用一个关于遗传漂变、突变和重组的经典双基因座模型,我们研究了平稳状态下r的概率密度函数。这个密度函数为推断诸如突变率和重组率等进化参数提供了一个工具。我们通过计算分布矩的有限序列并应用最大熵原理来重建r的近似平稳密度。我们的方法基于扩散近似,在该近似下我们证明,对于群体遗传学中的某些模型,无需知道概率分布本身就可以获得平稳分布的矩。为了说明我们的方法,我们展示了如何在最大似然框架中使用r的平稳概率密度从r的样本数据估计突变率和重组率。