Hu Dalong, Reeves Peter R
School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia.
School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
mSystems. 2020 Feb 11;5(1):e00705-19. doi: 10.1128/mSystems.00705-19.
Flagellin, the agent of prokaryotic flagellar motion, is very widely distributed and is the H antigen of serology. Flagellin molecules have a variable region that confers serotype specificity, encoded by the middle of the gene, and also conserved regions encoded by the two ends of the gene. We collected all available prokaryotic flagellin protein sequences and found the variable region diversity to be at two levels. In each species investigated, there are hypervariable region (HVR) forms without detectable homology in protein sequences between them. There is also considerable variation within HVR forms, indicating that some have been diverging for thousands of years and that interphylum horizontal gene transfers make a major contribution to the evolution of such atypical diversity. Bacterial and archaeal flagellins are remarkable in having a shared region with variation in housekeeping proteins and a region with extreme diversity, perhaps greater than for any other protein. Analysis of the 113,285 available full-gene sequences of flagellin genes from published bacterial and archaeal sequences revealed the nature and enormous extent of flagellin diversity. There were 35,898 unique amino acid sequences that were resolved into 187 clusters. Analysis of the and flagellins revealed that the variation occurs at two levels. The first is the division of the variable regions into sequence forms that are so divergent that there is no meaningful alignment even within species, and these corresponded to the or H-antigen groups. The second level is variation within these groups, which is extensive in both species. Shared sequence would allow PCR of the variable regions and thus strain-level analysis of microbiome DNA.
鞭毛蛋白是原核生物鞭毛运动的介质,分布极为广泛,是血清学中的H抗原。鞭毛蛋白分子有一个由基因中部编码的赋予血清型特异性的可变区,还有由基因两端编码的保守区。我们收集了所有可用的原核生物鞭毛蛋白序列,发现可变区多样性存在两个层次。在所研究的每个物种中,都存在高变区(HVR)形式,它们之间的蛋白质序列没有可检测到的同源性。HVR形式内部也存在相当大的变异,这表明有些已经分化了数千年,而且门间水平基因转移对这种非典型多样性的进化起了主要作用。细菌和古菌的鞭毛蛋白显著之处在于,它们有一个在管家蛋白中存在变异的共享区域和一个具有极端多样性的区域,这种多样性可能比任何其他蛋白质都要大。对已发表的细菌和古菌序列中113285条可用的鞭毛蛋白基因全序列分析揭示了鞭毛蛋白多样性的性质和巨大程度。有35898个独特的氨基酸序列,它们被分为187个簇。对 和 鞭毛蛋白的分析表明,变异发生在两个层次。第一个层次是可变区被划分为序列形式,这些形式差异极大,即使在物种内部也没有有意义的比对,这些对应于 或 H抗原组。第二个层次是这些组内的变异,在两个物种中都很广泛。共享序列将允许对可变区进行PCR,从而对微生物组DNA进行菌株水平分析。