Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA.
Genome Res. 2011 Dec;21(12):2038-48. doi: 10.1101/gr.122937.111. Epub 2011 Oct 12.
Microsatellites--tandem repeats of short DNA motifs--are abundant in the human genome and have high mutation rates. While microsatellite instability is implicated in numerous genetic diseases, the molecular processes involved in their emergence and disappearance are still not well understood. Microsatellites are hypothesized to follow a life cycle, wherein they are born and expand into adulthood, until their degradation and death. Here we identified microsatellite births/deaths in human, chimpanzee, and orangutan genomes, using macaque and marmoset as outgroups. We inferred mutations causing births/deaths based on parsimony, and investigated local genomic environments affecting them. We also studied birth/death patterns within transposable elements (Alus and L1s), coding regions, and disease-associated loci. We observed that substitutions were the predominant cause for births of short microsatellites, while insertions and deletions were important for births of longer microsatellites. Substitutions were the cause for deaths of microsatellites of virtually all lengths. AT-rich L1 sequences exhibited elevated frequency of births/deaths over their entire length, while GC-rich Alus only in their 3' poly(A) tails and middle A-stretches, with differences depending on transposable element integration timing. Births/deaths were strongly selected against in coding regions. Births/deaths occurred in genomic regions with high substitution rates, protomicrosatellite content, and L1 density, but low GC content and Alu density. The majority of the 17 disease-associated microsatellites examined are evolutionarily ancient (were acquired by the common ancestor of simians). Our genome-wide investigation of microsatellite life cycle has fundamental applications for predicting the susceptibility of birth/death of microsatellites, including many disease-causing loci.
微卫星是短 DNA 基序的串联重复,在人类基因组中丰富存在,且具有高突变率。虽然微卫星不稳定性与许多遗传疾病有关,但它们的出现和消失所涉及的分子过程仍未得到很好的理解。微卫星被假设遵循生命周期,在这个周期中,它们诞生并成长为成年,直到它们降解和死亡。在这里,我们使用猕猴和狨猴作为外群,在人类、黑猩猩和猩猩基因组中鉴定了微卫星的出生/死亡。我们基于简约法推断了导致出生/死亡的突变,并研究了影响它们的局部基因组环境。我们还研究了转座元件(Alu 和 L1)、编码区和疾病相关基因座内的出生/死亡模式。我们观察到,取代是短微卫星出生的主要原因,而插入和缺失对于长微卫星的出生很重要。取代是微卫星几乎所有长度死亡的原因。富含 AT 的 L1 序列在整个长度上表现出更高的出生/死亡频率,而富含 GC 的 Alu 仅在其 3' 聚(A)尾和中间 A 延伸中表现出这种频率,这种差异取决于转座元件整合的时间。出生/死亡在编码区受到强烈的选择压力。出生/死亡发生在高取代率、蛋白质微卫星含量和 L1 密度的基因组区域,但 GC 含量和 Alu 密度较低。在所检查的 17 个与疾病相关的微卫星中,大多数是进化古老的(是在灵长类动物的共同祖先中获得的)。我们对微卫星生命周期的全基因组研究对预测微卫星出生/死亡的易感性具有重要应用,包括许多致病基因座。