Choudhary M, Mackenzie C, Nereng K, Sodergren Erica, Weinstock G M, Kaplan S
Department of Microbiology & The University of Texas Health Science Center, Houston, TX 77225, USA.
Molecular Geneticsand Department of Biochemistry & Molecular Biology The University of Texas Health Science Center, Houston, TX 77225, USA.
Microbiology (Reading). 1997 Oct;143 ( Pt 10):3085-3099. doi: 10.1099/00221287-143-10-3085.
The photosynthetic bacterium Rhodobacter sphaeroides 2.4.1T has two chromosomes, CI (approximately 3.0 Mb) and CII (approximately 0.9 Mb). In this study a low-redundancy sequencing strategy was adopted to analyse 23 out of 47 cosmids from an ordered CII library. The sum of the lengths of these 23 cosmid inserts was approximately 495 kb, which comprised approximately 417 kb of unique DNA. A total of 1145 sequencing runs was carried out, with each run generating 559 +/- 268 bases of sequence to give approximately 640 kb of total sequence. After editing, approximately 2.8% bases per run were estimated to be ambiguous. After the removal of vector and Escherichia coli sequences, the remaining approximately 565 kb of R. sphaeroides sequences were assembled, generating approximately 291 kb of unique sequences. BLASTX analysis of these unique sequences suggested that approximately 131 kb (45% of the unique sequence) had matches to either known genes, or database ORFs of hypothetical or unknown function (dORFs). A total of 144 strong matches to the database was found; 101 of these matches represented genes encoding a wide variety of functions, e.g. amino acid biosynthesis, photosynthesis, nutrient transport, and various regulatory functions. Two rRNA operons (rrnB and rrnC) and five tRNAs were also identified. The remaining 160 kb of DNA sequence which did not yield database matches was then analysed using CODONPREFERENCE from the GCG package. This analysis suggested that 122 kb (42% of the total unique DNA sequence) could encode putative ORFs (pORFs), with the remaining 38 kb (13%) possibly representing non-coding intergenic DNA. From the data so far obtained, CII does not appear to be specialized for encoding any particular metabolic function, physiological state or growth condition. These data suggest that CII contains genes which are functionally as diverse as those found on any other bacterial chromosome and also contains sequences (pORFs), which may prove to be unique to this organism.
光合细菌球形红杆菌2.4.1T有两条染色体,CI(约3.0 Mb)和CII(约0.9 Mb)。在本研究中,采用了低冗余测序策略来分析来自有序CII文库的47个黏粒中的23个。这23个黏粒插入片段的总长度约为495 kb,其中包含约417 kb的独特DNA。总共进行了1145次测序,每次测序产生559±268个碱基的序列,共得到约640 kb的总序列。编辑后,估计每次测序约有2.8%的碱基不明确。去除载体和大肠杆菌序列后,对剩余的约565 kb球形红杆菌序列进行组装,得到约291 kb的独特序列。对这些独特序列的BLASTX分析表明,约131 kb(占独特序列的45%)与已知基因或功能假设或未知的数据库开放阅读框(dORF)匹配。总共在数据库中发现了144个强匹配;其中101个匹配代表编码多种功能的基因,如氨基酸生物合成、光合作用、营养物质运输和各种调节功能。还鉴定出两个rRNA操纵子(rrnB和rrnC)和五个tRNA。然后使用GCG软件包中的CODONPREFERENCE对其余未与数据库匹配的160 kb DNA序列进行分析。该分析表明,122 kb(占总独特DNA序列的42%)可以编码推定的开放阅读框(pORF),其余38 kb(13%)可能代表非编码基因间DNA。从目前获得的数据来看,CII似乎并不专门编码任何特定的代谢功能、生理状态或生长条件。这些数据表明,CII包含的基因在功能上与其他细菌染色体上的基因一样多样,并且还包含一些序列(pORF),这些序列可能是该生物体所特有的。