Bolotin A, Mauger S, Malarme K, Ehrlich S D, Sorokin A
Génétique Microbienne, INRA, Jouy en Josas, France.
Antonie Van Leeuwenhoek. 1999 Jul-Nov;76(1-4):27-76.
Lactococcus lactis is an AT-rich gram positive bacterium phylogenetically close to the genus Streptococcus. Various strains of L. lactis are used in dairy industry as starters for cheese making. L. lactis is also one of the well characterized laboratory microorganisms, widely used for studies on physiology of lactic acid bacteria. We describe here a low redundancy sequence of the genome of the strain L. lactis IL1403. The strategy which we followed to determine the sequence consists of two main steps. First, a limited number of plasmids and lambda-phages that carry random segments of the genome were sequenced. Second, sequences of the inserts were used for production of novel sequencing templates by applying Multiplex Long Accurate PCR protocols. Using of these PCR products allowed to determine the sequence of the entire 2.35 Mb genome with a very low redundancy, close to 2. The error rate of the sequence is estimated to be below 1%. The correctness of the sequence assembly was confirmed by PCR amplification of the entire L. lactis IL1403 genome, using a set of 266 oligonucleotides. Anotation of the sequence was undertaken by using automatic gene prediction computer tools. This allowed to identify 1495 protein-encoding genes, to locate them on the genome map and to classify their functions on the basis of homology to known proteins. The function of about 700 genes expected to encode proteins that lack homologs in data bases cannot be reliably predicted in this way. The approach which we used eliminates high redundancy sequencing and mapping efforts, needed to obtain detailed and comprehensive genetic and physical maps of a bacterium. Availability of detailed genetic and physical maps of the L. lactis IL1403 genome provides many entries to study metabolism and physiology of bacteria from this group. The presence of 42 copies of five different IS elements in the IL1403 genome confirms the importance of these elements for genetic exchange in Lactococci. These include two previously unknown elements, present at seven and fifteen copies and designated IS1077 and IS983, respectively. Five potential or rudimentary prophages were identified in the genome by detecting clusters of phage-related genes. The metabolic and regulatory potential of L. lactis was evaluated by inspecting gene sets classified into different functional categories. L. lactis has the genetic potential to synthesise 20 standard amino acids, purine and pyrimidine nucleotides and at least four cofactors. Some of these metabolites, which are usually present in chemically defined media, can probably be omitted. About twenty compounds can be used by L. lactis as a sole carbon source. Some 83 regulators were revealed, indicating a regulatory potential close to that of Haemophilus influenzae, a bacterium with a similar genome size. Unexpectedly, L. lactis has a complete set of late competence genes, which may have concerted transcriptional regulation and unleadered polycistronic mRNAs. These findings open new possibilities for developing genetic tools, useful for studies of gene regulation in AT-rich gram positive bacteria and for engineering of new strains for the diary industry.
乳酸乳球菌是一种富含腺嘌呤和胸腺嘧啶的革兰氏阳性菌,在系统发育上与链球菌属相近。多种乳酸乳球菌菌株在乳制品工业中用作奶酪制作的发酵剂。乳酸乳球菌也是特征明确的实验室微生物之一,广泛用于乳酸菌生理学研究。我们在此描述了乳酸乳球菌IL1403菌株基因组的低冗余序列。我们确定该序列所采用的策略包括两个主要步骤。首先,对携带基因组随机片段的有限数量的质粒和λ噬菌体进行测序。其次,通过应用多重长精确PCR方案,将插入片段的序列用于生成新的测序模板。使用这些PCR产物能够以非常低的冗余度(接近2)确定整个2.35 Mb基因组的序列。序列的错误率估计低于1%。通过使用一组266个寡核苷酸对整个乳酸乳球菌IL1403基因组进行PCR扩增,证实了序列组装的正确性。通过使用自动基因预测计算机工具对序列进行注释。这使得能够鉴定出1495个蛋白质编码基因,将它们定位在基因组图谱上,并根据与已知蛋白质的同源性对其功能进行分类。以这种方式无法可靠预测大约700个预期编码在数据库中缺乏同源物的蛋白质的基因的功能。我们使用的方法消除了获得细菌详细和全面的遗传图谱及物理图谱所需的高冗余测序和定位工作。乳酸乳球菌IL1403基因组详细遗传图谱和物理图谱的可得性为研究该类细菌的代谢和生理学提供了许多切入点。IL1403基因组中存在五种不同IS元件的42个拷贝,证实了这些元件对乳球菌遗传交换的重要性。其中包括两个先前未知的元件,分别以7个和15个拷贝存在,分别命名为IS1077和IS983。通过检测噬菌体相关基因簇,在基因组中鉴定出五个潜在或基本的原噬菌体。通过检查分类到不同功能类别的基因集来评估乳酸乳球菌的代谢和调控潜力。乳酸乳球菌具有合成20种标准氨基酸、嘌呤和嘧啶核苷酸以及至少四种辅因子的遗传潜力。这些代谢物中的一些通常存在于化学限定培养基中,可能可以省略。大约二十种化合物可以被乳酸乳球菌用作唯一碳源。揭示了约83个调节因子,表明其调控潜力与基因组大小相似的流感嗜血杆菌相近。出乎意料的是,乳酸乳球菌具有一套完整的晚期感受态基因,可能具有协同转录调控和无前导多顺反子mRNA。这些发现为开发遗传工具开辟了新的可能性,这些工具可用于富含腺嘌呤和胸腺嘧啶的革兰氏阳性菌的基因调控研究以及乳制品工业新菌株的工程改造。