Sun James X, Mullikin James C, Patterson Nick, Reich David E
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Mol Biol Evol. 2009 May;26(5):1017-27. doi: 10.1093/molbev/msp025. Epub 2009 Feb 12.
Microsatellite length mutations are often modeled using the generalized stepwise mutation process, which is a type of random walk. If this model is sufficiently accurate, one can estimate the coalescence time between alleles of a locus after a mathematical transformation of the allele lengths. When large-scale microsatellite genotyping first became possible, there was substantial interest in using this approach to make inferences about time and demography, but that interest has waned because it has not been possible to empirically validate the clock by comparing it with data in which the mutation process is well understood. We analyzed data from 783 microsatellite loci in human populations and 292 loci in chimpanzee populations, and compared them with up to one gigabase of aligned sequence data, where the molecular clock based upon nucleotide substitutions is believed to be reliable. We empirically demonstrate a remarkable linearity (r(2) > 0.95) between the microsatellite average square distance statistic and sequence divergence. We demonstrate that microsatellites are accurate molecular clocks for coalescent times of at least 2 million years (My). We apply this insight to confirm that the African populations San, Biaka Pygmy, and Mbuti Pygmy have the deepest coalescent times among populations in the Human Genome Diversity Project. Furthermore, we show that microsatellites support unbiased estimates of population differentiation (F(ST)) that are less subject to ascertainment bias than single nucleotide polymorphism (SNP) F(ST). These results raise the prospect of using microsatellite data sets to determine parameters of population history. When genotyped along with SNPs, microsatellite data can also be used to correct for SNP ascertainment bias.
微卫星长度突变通常使用广义逐步突变过程进行建模,这是一种随机游走类型。如果该模型足够准确,那么在对等位基因长度进行数学变换后,就可以估计一个基因座上等位基因之间的合并时间。当大规模微卫星基因分型首次成为可能时,人们对使用这种方法来推断时间和人口统计学产生了浓厚兴趣,但这种兴趣已经减弱,因为无法通过将其与突变过程已被充分理解的数据进行比较来实证验证该时钟。我们分析了人类群体中783个微卫星基因座和黑猩猩群体中292个基因座的数据,并将它们与多达10亿碱基的比对序列数据进行比较,据信基于核苷酸替换的分子时钟在这些序列数据中是可靠的。我们通过实证证明了微卫星平均平方距离统计量与序列分歧之间具有显著的线性关系(r(2) > 0.95)。我们证明微卫星对于至少200万年(My)的合并时间是准确的分子时钟。我们运用这一见解证实,在人类基因组多样性项目的群体中,非洲群体桑人、比亚卡俾格米人和姆布蒂俾格米人具有最深的合并时间。此外,我们表明微卫星支持对群体分化(F(ST))的无偏估计,与单核苷酸多态性(SNP)F(ST)相比,其受确定偏差的影响较小。这些结果为使用微卫星数据集来确定群体历史参数带来了希望。当与SNP一起进行基因分型时,微卫星数据还可用于校正SNP确定偏差。