Hudson R R, Kaplan N L
Genetics. 1985 Sep;111(1):147-64. doi: 10.1093/genetics/111.1.147.
Some statistical properties of samples of DNA sequences are studied under an infinite-site neutral model with recombination. The two quantities of interest are R, the number of recombination events in the history of a sample of sequences, and RM, the number of recombination events that can be parsimoniously inferred from a sample of sequences. Formulas are derived for the mean and variance of R. In contrast to R, RM can be determined from the sample. Since no formulas are known for the mean and variance of RM, they are estimated with Monte Carlo simulations. It is found that RM is often much less than R, therefore, the number of recombination events may be greatly under-estimated in a parsimonious reconstruction of the history of a sample. The statistic RM can be used to estimate the product of the recombination rate and the population size or, if the recombination rate is known, to estimate the population size. To illustrate this, DNA sequences from the Adh region of Drosophila melanogaster are used to estimate the effective population size of this species.
在具有重组的无限位点中性模型下,研究了DNA序列样本的一些统计特性。两个感兴趣的量分别是R,即一组序列样本历史中的重组事件数,以及RM,即可以从一组序列样本中简约推断出的重组事件数。推导了R的均值和方差公式。与R不同,RM可以从样本中确定。由于RM的均值和方差没有已知公式,因此通过蒙特卡罗模拟对其进行估计。结果发现,RM通常远小于R,因此,在对一组样本历史进行简约重建时,重组事件的数量可能会被大大低估。统计量RM可用于估计重组率与种群大小的乘积,或者,如果重组率已知,则用于估计种群大小。为了说明这一点,使用黑腹果蝇Adh区域的DNA序列来估计该物种的有效种群大小。