Wilson I J, Balding D J
School of Biological Sciences, Queen Mary and Westfield College, University of London, London E1 4NS, England.
Genetics. 1998 Sep;150(1):499-510. doi: 10.1093/genetics/150.1.499.
Ease and accuracy of typing, together with high levels of polymorphism and widespread distribution in the genome, make microsatellite (or short tandem repeat) loci an attractive potential source of information about both population histories and evolutionary processes. However, microsatellite data are difficult to interpret, in particular because of the frequency of back-mutations. Stochastic models for the underlying genetic processes can be specified, but in the past they have been too complicated for direct analysis. Recent developments in stochastic simulation methodology now allow direct inference about both historical events, such as genealogical coalescence times, and evolutionary parameters, such as mutation rates. A feature of the Markov chain Monte Carlo (MCMC) algorithm that we propose here is that the likelihood computations are simplified by treating the (unknown) ancestral allelic states as auxiliary parameters. We illustrate the algorithm by analyzing microsatellite samples simulated under the model. Our results suggest that a single microsatellite usually does not provide enough information for useful inferences, but that several completely linked microsatellites can be informative about some aspects of genealogical history and evolutionary processes. We also reanalyze data from a previously published human Y chromosome microsatellite study, finding evidence for an effective population size for human Y chromosomes in the low thousands and a recent time since their most recent common ancestor: the 95% interval runs from approximately 15, 000 to 130,000 years, with most likely values around 30,000 years.
微卫星(或短串联重复序列)位点易于打字且准确率高,同时在基因组中具有高度多态性和广泛分布,这使其成为有关种群历史和进化过程的潜在信息来源。然而,微卫星数据难以解释,尤其是由于回复突变的频率。可以指定潜在遗传过程的随机模型,但过去它们过于复杂,无法直接分析。随机模拟方法的最新进展现在允许直接推断历史事件,如系谱合并时间,以及进化参数,如突变率。我们在此提出的马尔可夫链蒙特卡罗(MCMC)算法的一个特点是,通过将(未知的)祖先等位基因状态视为辅助参数,简化了似然计算。我们通过分析在该模型下模拟的微卫星样本来说明该算法。我们的结果表明,单个微卫星通常无法提供足够的信息进行有用的推断,但几个完全连锁的微卫星可以提供有关系谱历史和进化过程某些方面的信息。我们还重新分析了先前发表的一项人类Y染色体微卫星研究的数据,发现人类Y染色体的有效种群大小在数千人以下,且自其最近共同祖先以来的时间较近:95%的区间约为15,000至130,000年,最可能的值约为30,000年。