Kunai K, Machida M, Matsuzawa H, Ohta T
Eur J Biochem. 1986 Oct 15;160(2):433-40. doi: 10.1111/j.1432-1033.1986.tb09991.x.
The gene for L-lactate dehydrogenase (LDH) (EC 1.1.1.27) of Thermus caldophilus GK24 was cloned in Escherichia coli using synthetic oligonucleotides as hybridization probes. The nucleotide sequence of the cloned DNA was determined. The primary structure of the LDH was deduced from the nucleotide sequence. The deduced amino acid sequence agreed with the NH2-terminal and COOH-terminal sequences previously reported and the determined amino acid sequences of the peptides obtained from trypsin-digested T. caldophilus LDH. The LDH comprised 310 amino acid residues and its molecular mass was determined to be 32,808. On alignment of the whole amino acid sequences, the T. caldophilus LDH showed about 40% identity with the Bacillus stearothermophilus, Lactobacillus casei and dogfish muscle LDHs. The T. caldophilus LDH gene was expressed with the E. coli lac promoter in E. coli, which resulted in the production of the thermophilic LDH. The gene for the T. caldophilus LDH showed more than 40% identity with those for the human and mouse muscle LDHs on alignment of the whole nucleotide sequences. The G + C content of the coding region for the T. caldophilus LDH was 74.1%, which was higher than that of the chromosomal DNA (67.2%). The G + C contents in the first, second and third positions of the codons used were 77.7%, 48.1% and 95.5% respectively. The high G + C content in the third base caused extremely non-random codon usage in the LDH gene. About half (48.7%) the codons in the LDH gene started with G, and hence there were relatively high contents of Val, Ala, Glu and Gly in the LDH. The contents of Pro, Arg, Ala and Gly, which have high G + C contents in their codons, were also high. Rare codons with U or A as the third base were sometimes used to avoid the TCGA sequence, the recognition site for the restriction endonuclease, TaqI. Two TCGA sequences were found only in the sequence of CTCGAG (XhoI site) in the sequenced region of the T. caldophilus DNA. There were three segments with similar sequences in the two 5' non-coding regions, probably the promoter and ribosome-binding regions, of the genes for the T. caldophilus LDH and the Thermus thermophilus 3-isopropylmalate dehydrogenase.
利用合成寡核苷酸作为杂交探针,将嗜热栖热菌GK24的L-乳酸脱氢酶(LDH)(EC 1.1.1.27)基因克隆到大肠杆菌中。测定了克隆DNA的核苷酸序列。从核苷酸序列推导了LDH的一级结构。推导的氨基酸序列与先前报道的NH2-末端和COOH-末端序列以及从胰蛋白酶消化的嗜热栖热菌LDH获得的肽段的测定氨基酸序列一致。LDH由310个氨基酸残基组成,其分子量测定为32,808。在对整个氨基酸序列进行比对时,嗜热栖热菌LDH与嗜热脂肪芽孢杆菌、干酪乳杆菌和角鲨肌LDH显示出约40%的同一性。嗜热栖热菌LDH基因在大肠杆菌中用大肠杆菌乳糖启动子进行表达,从而产生嗜热LDH。在对整个核苷酸序列进行比对时,嗜热栖热菌LDH基因与人和小鼠肌肉LDH基因显示出超过40%的同一性。嗜热栖热菌LDH编码区的G + C含量为74.1%,高于染色体DNA的G + C含量(67.2%)。所使用密码子的第一、第二和第三位的G + C含量分别为77.7%、48.1%和95.5%。第三位的高G + C含量导致LDH基因中密码子的使用极不随机。LDH基因中约一半(48.7%)的密码子以G开头,因此LDH中缬氨酸、丙氨酸、谷氨酸和甘氨酸的含量相对较高。其密码子具有高G + C含量的脯氨酸、精氨酸、丙氨酸和甘氨酸的含量也很高。有时使用以U或A作为第三位的稀有密码子来避免TCGA序列,即限制性内切酶TaqI的识别位点。仅在嗜热栖热菌DNA测序区域的CTCGAG(XhoI位点)序列中发现了两个TCGA序列。在嗜热栖热菌LDH基因和嗜热栖热栖热菌3-异丙基苹果酸脱氢酶基因的两个5'非编码区(可能是启动子和核糖体结合区)存在三个具有相似序列的片段。