Fu Y X, Chakraborty R
Human Genetics Center, University of Texas, Houston, Texas 77225, USA.
Genetics. 1998 Sep;150(1):487-97. doi: 10.1093/genetics/150.1.487.
Minisatellite and microsatellite are short tandemly repetitive sequences dispersed in eukaryotic genomes, many of which are highly polymorphic due to copy number variation of the repeats. Because mutation changes copy numbers of the repeat sequences in a generalized stepwise fashion, stepwise mutation models are widely used for studying the dynamics of these loci. We propose a minimum chi-square (MCS) method for simultaneous estimation of all the parameters in a stepwise mutation model and the ancestral allelic type of a sample. The MCS estimator requires knowing the mean number of alleles of a certain size in a sample, which can be estimated using Monte Carlo samples generated by a coalescent algorithm. The method is applied to samples of seven (CA)n repeat loci from eight human populations and one chimpanzee population. The estimated values of parameters suggest that there is a general tendency for microsatellite alleles to expand in size, because (1) each mutation has a slight tendency to cause size increase and (2) the mean size increase is larger than the mean size decrease for a mutation. Our estimates also suggest that most of these CA-repeat loci evolve according to multistep mutation models rather than single-step mutation models. We also introduced several quantities for measuring the quality of the estimation of ancestral allelic type, and it appears that the majority of the estimated ancestral allelic types are reasonably accurate. Implications of our analysis and potential extensions of the method are discussed. SINCE the discovery that a large number of loci with tandemly repeated sequences in human and many eukaryote species are highly polymorphic because of copy number variation of the repeats in different individuals (Jeffreys 1985; Litt and Luty 1989; Weber and May 1989), allele size data from such loci are rapidly becoming the dominant source of genetic markers for genome mapping, forensic testing, and population studies. Loci with repeat sequences longer than 5 bp are generally referred to as minisatellite or variable number tandem repeat loci, and those with repeat sequences between 2 to 5 bp are referred to as microsatellite or short tandem repeat loci (Tautz 1993). Because mutations change the copy number of such loci in a stepwise fashion, rapid accumulation of population samples from minisatellite and microsatellite loci has resurrected the interest of the stepwise mutation model (SMM), which was popular in the 1970s.
微卫星和小卫星是分散在真核生物基因组中的短串联重复序列,其中许多由于重复序列的拷贝数变异而具有高度多态性。由于突变以广义的逐步方式改变重复序列的拷贝数,逐步突变模型被广泛用于研究这些位点的动态变化。我们提出了一种最小卡方(MCS)方法,用于同时估计逐步突变模型中的所有参数以及样本的祖先等位基因类型。MCS估计器需要知道样本中特定大小等位基因的平均数量,这可以使用由合并算法生成的蒙特卡罗样本进行估计。该方法应用于来自八个人类群体和一个黑猩猩群体的七个(CA)n重复位点的样本。参数的估计值表明,微卫星等位基因大小存在普遍的扩张趋势,原因如下:(1)每个突变都有轻微的导致大小增加的趋势;(2)对于一个突变,平均大小增加大于平均大小减少。我们的估计还表明,这些CA重复位点中的大多数是根据多步突变模型而非单步突变模型进化的。我们还引入了几个用于衡量祖先等位基因类型估计质量的量,而且看起来大多数估计的祖先等位基因类型相当准确。讨论了我们分析的意义以及该方法的潜在扩展。自从发现人类和许多真核生物物种中大量具有串联重复序列的位点由于不同个体中重复序列的拷贝数变异而具有高度多态性以来(杰弗里斯,1985;利特和卢蒂,1989;韦伯和梅,1989),来自此类位点的等位基因大小数据正迅速成为基因组作图、法医检测和群体研究中遗传标记的主要来源。重复序列长度超过5bp的位点通常被称为微卫星或可变数目串联重复位点,而重复序列长度在2至5bp之间的位点被称为微卫星或短串联重复位点(陶茨,1993)。由于突变以逐步方式改变此类位点的拷贝数,来自微卫星和微卫星位点的群体样本的快速积累重新唤起了人们对20世纪70年代流行的逐步突变模型(SMM)的兴趣。