Felsenstein J
Department of Genetics SK-50, University of Washington, Seattle 98195.
Genet Res. 1992 Apr;59(2):139-47. doi: 10.1017/s0016672300030354.
It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.
已知在已知突变率的中性突变情况下,假设不存在重组的核苷酸序列样本可用于估计隔离群体的有效大小。本文研究了非常长序列的情况,其中每对序列都能精确估计这两个基因拷贝的分化时间。所有拷贝对的平均分化时间估计为有效群体数量的两倍,并且也可以从分离位点的数量得出估计值。人们也可以选择估计拷贝的谱系。本文展示了如何从这样的谱系树中得出有效群体数量的最大似然估计。成对估计和分离位点估计的效率远低于这种最大似然估计,并且通过计算机模拟得到了验证。结果表明,明确考虑这些谱系的树结构会有很大收获。