Capuano V, Galleron N, Pujic P, Sorokin A, Ehrlich S D
Laboratoire de Génétique Microbienne, Institut National de la Recherche Agronomique, Jouy en Josas, France.
Microbiology (Reading). 1996 Nov;142 ( Pt 11):3005-15. doi: 10.1099/13500872-142-11-3005.
Within the Bacillus subtilis genome sequencing project, the region between lysA and ilvA was assigned to our laboratory. In this report we present the sequence of the last 36 kb of this region, between the kdg operon and the attachment site of the SP beta prophage. A two-step strategy was used for the sequencing. In the first step, total chromosomal DNA was cloned in phage M13-based vectors and the clones carrying inserts from the target region were identified by hybridization with a cognate yeast artificial chromosome (YAC) from our collection. Sequencing of the clones allowed us to establish a number of contigs. In the second step the contigs were mapped by Long Accurate (LA) PCR and the remaining gaps closed by sequencing of the PCR products. The level of sequence inaccuracy due to LA PCR errors appeared to be about 1 in 10,000, which does not affect significantly the final sequence quality. This two-step strategy is efficient and we suggest that it can be applied to sequencing of longer chromosomal regions. The 36 kb sequence contains 38 coding sequences (CDSs), 19 of which encode unknown proteins. Seven genetic loci already mapped in this region, xpt, metB, ilvA, ilvD, thyB, dfrA and degR were identified. Eleven CDSs were found to display significant similarities to known proteins from the data banks, suggesting possible functions for some of the novel genes: cspD may encode a cold shock protein; bcsA, the first bacterial homologue of chalcone synthase; exol, a 5' to 3' exonuclease, similar to that of DNA polymerase I of Escherichia coli; and bsaA, a stress-response-associated protein. The protein encoded by yplP has homology with the transcriptional NifA-like regulators. The arrangement of the genes relative to possible promoters and terminators suggests 19 potential transcription units.
在枯草芽孢杆菌基因组测序项目中,lysA和ilvA之间的区域被分配给了我们实验室。在本报告中,我们展示了该区域最后36 kb的序列,即kdg操纵子和SPβ原噬菌体附着位点之间的序列。采用了两步测序策略。第一步,将总染色体DNA克隆到基于噬菌体M13的载体中,并通过与我们所收集的同源酵母人工染色体(YAC)杂交来鉴定携带目标区域插入片段的克隆。对这些克隆进行测序使我们建立了一些重叠群。第二步,通过长精确(LA)PCR对重叠群进行定位,并通过对PCR产物测序来填补剩余的缺口。由于LA PCR错误导致的序列不准确水平似乎约为万分之一,这对最终的序列质量没有显著影响。这种两步策略是有效的,我们建议它可应用于更长染色体区域的测序。36 kb的序列包含38个编码序列(CDS),其中19个编码未知蛋白质。已鉴定出该区域中先前定位的7个遗传位点,即xpt、metB、ilvA、ilvD、thyB、dfrA和degR。发现有11个CDS与数据库中的已知蛋白质具有显著相似性,这表明一些新基因可能具有的功能:cspD可能编码一种冷休克蛋白;bcsA是查尔酮合酶的首个细菌同源物;exol是一种5'至3'核酸外切酶,与大肠杆菌DNA聚合酶I的核酸外切酶相似;bsaA是一种与应激反应相关的蛋白质。yplP编码的蛋白质与转录NifA样调节因子具有同源性。基因相对于可能的启动子和终止子的排列表明有19个潜在的转录单元。