Jordan Philip, Snyder Lori A S, Saunders Nigel J
Bacterial Pathogenesis and Functional Genomics Group, The Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford, OX1 3RE, UK.
BMC Microbiol. 2003 Nov 12;3:23. doi: 10.1186/1471-2180-3-23.
Tandem repeats contained within coding regions can mediate phase variation when the repeated units change the reading frame of the coding sequence in a copy number dependent manner. Coding tandem repeats are those which do not alter the reading frame with copy number, and the changes in copy number of these repeats may then potentially alter the function or antigenicity of the protein encoded. Three complete neisserial genomes were analyzed and compared to identify coding tandem repeats where the number of copies of the repeat will have some structural consequence for the protein. This is the first study to address coding tandem repeats that may affect protein structures using comparative genomics, combined with a population survey to investigate which show interstrain variability.
A total of 28 genes were identified. Of these, 22 contain coding tandem repeats that vary in copy number between the three sequenced strains, three strain specific genes were included for investigation on the basis of having >90% identity between repeated units, and three genes with repeated elements of >250 bp were included although no length variations were seen in the genomes. Amplification, and sequencing of repeats showing altered copy number, of these 28 coding tandem repeat containing regions, from a set of largely unrelated strains, revealed further repeat length variation in several cases.
Eighteen genes were identified which have variation in repeat copy number between strains of the same species, twelve of which show greater diversity in repeat copy number than is present in the sequenced genomes. In some cases, this may reflect a mechanism for the generation of antigenic variation, as previously described in other species. However, some of the genes identified encode proteins with cytoplasmic functions, including sugar metabolism, DNA repair, and protein production, in which repeat length variation may have other functions. Coding tandem repeats appear to represent a largely unexplored mechanism of generating diversity in the Neisseria spp.
编码区域内的串联重复序列在重复单元以拷贝数依赖方式改变编码序列的阅读框时可介导相变。编码串联重复序列是指那些拷贝数变化不会改变阅读框的序列,这些重复序列拷贝数的变化可能会潜在地改变所编码蛋白质的功能或抗原性。对三个完整的奈瑟氏菌基因组进行了分析和比较,以鉴定其串联重复序列,这些重复序列的拷贝数变化会对蛋白质产生一些结构上的影响。这是第一项利用比较基因组学研究可能影响蛋白质结构的编码串联重复序列的研究,并结合群体调查来研究哪些序列表现出菌株间变异性。
共鉴定出28个基因。其中,22个基因含有编码串联重复序列,在三个测序菌株之间拷贝数不同;基于重复单元之间具有>90%的同一性,纳入了三个菌株特异性基因进行研究;尽管在基因组中未观察到长度变化,但纳入了三个重复元件>250 bp的基因。对这28个含有编码串联重复序列的区域进行扩增,并对显示拷贝数改变的重复序列进行测序,这些区域来自一组基本不相关的菌株,结果在几个案例中发现了进一步的重复长度变异。
鉴定出18个基因,它们在同一物种的菌株之间重复拷贝数存在变异,其中12个基因的重复拷贝数多样性高于测序基因组中所呈现的。在某些情况下,这可能反映了一种产生抗原变异的机制,如先前在其他物种中所描述的。然而,一些鉴定出的基因编码具有细胞质功能(包括糖代谢、DNA修复和蛋白质合成)的蛋白质,其中重复长度变异可能具有其他功能。编码串联重复序列似乎代表了奈瑟氏菌属中一种很大程度上未被探索的产生多样性的机制。