Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA.
Nucleic Acids Res. 2012 Mar;40(6):2399-413. doi: 10.1093/nar/gkr1078. Epub 2011 Nov 28.
Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on-off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.
简单序列重复(SSR)是基因组中的插入/缺失突变热点。在原核生物中,SSR 位点可导致表型变异,这是一种依赖于基因活性随机、可逆的开/关切换的微生物生存策略。通过分析 42 个完全测序的原核生物物种的多个菌株,我们测量了编码区中 SSR 的相对可变性和密度分布。我们证明重复类型强烈影响插入/缺失突变率,并且在整个基因组中,最易变的类型被强烈避免。我们彻底研究了 SSR 密度和可变性作为蛋白质序列中 N→C 位置的函数。使用保留氨基酸序列的密码子洗牌算法,我们评估了 SSR 上的进化压力。我们发现编码序列抑制蛋白质中间的重复,并且在末端富集重复,从而产生 U 形 SSR 密度曲线。我们表明,对于许多物种,这种特征形状可以归因于蛋白质结构的纯物理约束。然而,在多种情况下,特别是在某些致病性细菌中,我们观察到蛋白质 N 末端附近的 SSR 过度富集,明显超出基于结构约束的预期。这增加了移码导致非功能性蛋白质的概率,表明这些物种可能在进化上调整编码区中 SSR 的位置以促进表型变异。