Hiller N Luisa, Janto Benjamin, Hogg Justin S, Boissy Robert, Yu Susan, Powell Evan, Keefe Randy, Ehrlich Nathan E, Shen Kai, Hayes Jay, Barbadora Karen, Klimke William, Dernovoy Dmitry, Tatusova Tatiana, Parkhill Julian, Bentley Stephen D, Post J Christopher, Ehrlich Garth D, Hu Fen Z
Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, PA 15212, USA.
J Bacteriol. 2007 Nov;189(22):8186-95. doi: 10.1128/JB.00690-07. Epub 2007 Aug 3.
The distributed-genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium and that these pathogens utilize genetic recombination and a large, noncore set of genes as a means of diversity generation. We sequenced the genomes of eight nasopharyngeal strains of Streptococcus pneumoniae isolated from pediatric patients with upper respiratory symptoms and performed quantitative genomic analyses among these and nine publicly available pneumococcal strains. Coding sequences from all strains were grouped into 3,170 orthologous gene clusters, of which 1,454 (46%) were conserved among all 17 strains. The majority of the gene clusters, 1,716 (54%), were not found in all strains. Genic differences per strain pair ranged from 35 to 629 orthologous clusters, with each strain's genome containing between 21 and 32% noncore genes. The distribution of the orthologous clusters per genome for the 17 strains was entered into the finite-supragenome model, which predicted that (i) the S. pneumoniae supragenome contains more than 5,000 orthologous clusters and (ii) 99% of the orthologous clusters ( approximately 3,000) that are represented in the S. pneumoniae population at frequencies of >or=0.1 can be identified if 33 representative genomes are sequenced. These extensive genic diversity data support the DGH and provide a basis for understanding the great differences in clinical phenotype associated with various pneumococcal strains. When these findings are taken together with previous studies that demonstrated the presence of a supragenome for Streptococcus agalactiae and Haemophilus influenzae, it appears that the possession of a distributed genome is a common host interaction strategy.
分布式基因组假说(DGH)指出,致病细菌拥有一个比任何单个细菌的基因组都大得多的超基因组,并且这些病原体利用基因重组和大量非核心基因作为产生多样性的一种方式。我们对从患有上呼吸道症状的儿科患者中分离出的8株肺炎链球菌鼻咽菌株的基因组进行了测序,并在这些菌株与9株公开可得的肺炎球菌菌株之间进行了定量基因组分析。所有菌株的编码序列被分组为3170个直系同源基因簇,其中1454个(46%)在所有17株菌株中都是保守的。大多数基因簇,即1716个(54%),并非在所有菌株中都能找到。每对菌株之间的基因差异范围为35至629个直系同源簇,每个菌株的基因组包含21%至32%的非核心基因。将这17株菌株每个基因组中的直系同源簇分布输入有限超基因组模型,该模型预测:(i)肺炎链球菌超基因组包含超过5000个直系同源簇;(ii)如果对33个代表性基因组进行测序,就可以识别出肺炎链球菌群体中频率≥0.1的99%的直系同源簇(约3000个)。这些广泛的基因多样性数据支持了分布式基因组假说,并为理解与各种肺炎球菌菌株相关的临床表型的巨大差异提供了基础。当这些发现与先前证明无乳链球菌和流感嗜血杆菌存在超基因组的研究结合起来时,似乎拥有分布式基因组是一种常见的宿主相互作用策略。