Feulner G, Gray J A, Kirschman J A, Lehner A F, Sadosky A B, Vlazny D A, Zhang J, Zhao S, Hill C W
Department of Biological Chemistry, Milton S. Hershey Medical Center, Pennsylvania State University, Hershey 17033.
J Bacteriol. 1990 Jan;172(1):446-56. doi: 10.1128/jb.172.1.446-456.1990.
The complete nucleotide sequence of the rhsA locus and selected portions of other members of the rhs multigene family of Escherichia coli K-12 have been determined. A definition of the limits of the rhsA and rhsC loci was established by comparing sequences from E. coli K-12 with sequences from an independent E. coli isolate whose DNA contains no homology to the rhs core. This comparison showed that rhsA comprises 8,249 base pairs (bp) in strain K-12 and that the Rhs0 strain, instead, contains an unrelated 32-bp sequence. Similarly, the K-12 rhsC locus is 9.6 kilobases in length and a 10-bp sequence resides at its location in the Rhs0 strain. The rhsA core, the highly conserved portion shared by all rhs loci, comprises a single open reading frame (ORF) 3,714 bp in length. The nucleotide sequence of the core ORF predicts an extremely hydrophilic 141-kilodalton peptide containing 28 repeats of a motif whose consensus is GxxxRYxYDxxGRL(I or T). One of the most novel aspects of the rhs family is the extension of the core ORF into the divergent adjacent region. Core extensions of rhsA, rhsB, rhsC, and rhsD add 139, 173, 159, and 177 codons to the carboxy termini of the respective core ORFs. For rhsA, the extended core protein would have a molecular mass of 156 kilodaltons. Core extensions of rhsB and rhsD are related, exhibiting 50.3% conservation of the predicted amino acid sequence. However, comparison of the core extensions of rhsA and rhsC at both the nucleotide and the predicted amino acid level reveals that each is highly divergent from the other three rhs loci. The highly divergent portion of the core extension is joined to the highly conserved core by a nine-codon segment of intermediate conservation. The rhsA and rhsC loci both contain partial repetitions of the core downstream from their primary cores. The question of whether the rhs loci should be considered accessory genetic elements is discussed but not resolved.
已确定大肠杆菌K-12 rhsA基因座的完整核苷酸序列以及rhs多基因家族其他成员的选定部分。通过将大肠杆菌K-12的序列与来自一个独立大肠杆菌分离株的序列进行比较,确定了rhsA和rhsC基因座的界限,该分离株的DNA与rhs核心无同源性。这种比较表明,K-12菌株中的rhsA由8249个碱基对(bp)组成,而Rhs0菌株则包含一个不相关的32 bp序列。同样,K-12的rhsC基因座长度为9.6千碱基,而在Rhs0菌株中其位置存在一个10 bp的序列。rhsA核心是所有rhs基因座共有的高度保守部分,包含一个长度为3714 bp的单一开放阅读框(ORF)。核心ORF的核苷酸序列预测了一种极端亲水的141千道尔顿肽,含有28个重复基序,其共有序列为GxxxRYxYDxxGRL(I或T)。rhs家族最新颖的方面之一是核心ORF延伸到相邻的发散区域。rhsA、rhsB、rhsC和rhsD的核心延伸分别在各自核心ORF的羧基末端添加了139、173、159和177个密码子。对于rhsA,延伸的核心蛋白分子量为156千道尔顿。rhsB和rhsD的核心延伸相关,预测氨基酸序列的保守性为50.3%。然而,在核苷酸和预测氨基酸水平上比较rhsA和rhsC的核心延伸发现,它们彼此之间高度不同。核心延伸的高度发散部分通过一个具有中等保守性的九密码子片段与高度保守的核心相连。rhsA和rhsC基因座在其主要核心下游都包含核心的部分重复。文中讨论了rhs基因座是否应被视为辅助遗传元件的问题,但未得到解决。