Yagüe E, Béguin P, Aubert J P
Département des Biotechnologies, Institut Pasteur, Paris, France.
Gene. 1990 Apr 30;89(1):61-7. doi: 10.1016/0378-1119(90)90206-7.
The complete nucleotide sequence of the celH gene of Clostridium thermocellum was determined. The open reading frame extended over 2.7-kb DNA fragment and encoded a 900-amino acid (aa) protein (Mr 102,301) which hydrolyzes carboxymethylcellulose, p-nitrophenyl-beta-D-cellobioside, methylumbelliferyl- beta-D-cellobioside, barley beta-glucan, and larchwood xylan. The N terminus showed a typical signal peptide, and a cleavage site after Ser44 was predicted. Two Pro-Thr-Ser-rich regions divided the protein into three approximately equal domains. The central 328-aa region was similar to the N-terminal part, carrying the active site, of C. thermocellum endoglucanase E (EGE; 30.2%). The C-terminal region ended with two conserved 24-aa stretches showing close similarity with those previously described in EGA, EGB, EGD, EGE, EGX, and xylanase from C. thermocellum. Deletions of celH removing up to 327 codons from the 5' end and up to 245 codons from the 3' end of the coding sequence did not affect enzyme activity, confirming that the central domain was indeed responsible for catalytic activity. Production of truncated EGH in Escherichia coli was increased up to 120-fold by fusing fragments containing the 3' portion of the gene with the start of lacZ' present in pTZ19R.