Long G L, Chandra T, Woo S L, Davie E W, Kurachi K
Biochemistry. 1984 Oct 9;23(21):4828-37. doi: 10.1021/bi00316a003.
A 1434 base pair human liver cDNA coding for the entire alpha 1-antitrypsin protein has been isolated and sequenced. Translation of the coding region into amino acids reveals a precursor molecule which contains a 24 amino acid signal peptide and 394 amino acids present in the mature polypeptide chain. The human gene for the S variant of alpha 1-antitrypsin has also been subcloned and sequenced. The gene is composed of 10226 nucleotide bases and is approximately equimolar for all 4 nucleotides. The gene contains four intervening sequences (introns) and 5' and 3' noncoding regions which are 54 and 79 nucleotides in length, respectively. A 5.3-kilobase intron exists in the 5' noncoding region and contains a 143 amino acid open reading frame, an Alu family sequence, and a pseudo transcription initiation region. No significant differences in base composition are seen between the introns and those regions corresponding to coding regions of the corresponding mRNA (exons). A sequence of 1951 nucleotides flanking the 5' end of the gene has also been determined and contains a "TATA" box sequence (TTAAA-TA) 21 nucleotides upstream from the proposed transcription start site. Comparison of the gene sequence with the cDNA sequence reveals a single base substitution (A----T), which results in a Glu----Val substitution at position 264 in the S variant protein. The position and size of introns, the overall base composition, and the codon preference for the alpha 1-anti-trypsin gene differ from those for the chicken ovalbumin gene even though the two proteins belong to a common protein family, as judged by amino acid sequence homology.
一个编码完整α1 -抗胰蛋白酶蛋白的1434个碱基对的人肝脏cDNA已被分离并测序。将编码区翻译成氨基酸后发现一个前体分子,它包含一个24个氨基酸的信号肽和成熟多肽链中的394个氨基酸。α1 -抗胰蛋白酶S变体的人类基因也已被亚克隆并测序。该基因由10226个核苷酸碱基组成,所有4种核苷酸的含量大致相等。该基因包含四个间隔序列(内含子)以及5'和3'非编码区,其长度分别为54和79个核苷酸。一个5.3千碱基的内含子存在于5'非编码区,包含一个143个氨基酸的开放阅读框、一个Alu家族序列和一个假转录起始区。内含子与相应mRNA编码区(外显子)对应的区域之间在碱基组成上没有显著差异。基因5'端侧翼的1951个核苷酸的序列也已确定,在推测的转录起始位点上游21个核苷酸处含有一个“TATA”框序列(TTAAA - TA)。将基因序列与cDNA序列进行比较,发现一个单碱基替换(A→T),这导致S变体蛋白第264位的谷氨酸被缬氨酸取代。尽管根据氨基酸序列同源性判断这两种蛋白质属于同一个蛋白质家族,但α1 -抗胰蛋白酶基因的内含子位置和大小、整体碱基组成以及密码子偏好与鸡卵清蛋白基因不同。