小鼠多基因家族富含脯氨酸蛋白基因的结构与组织

One gene of the mouse proline-rich protein multigene family was cloned on a 3.6-kilobase pair EcoRI/BglII DNA fragment from a (partial) Sau3A bacteriophage library of CD-1 mouse chromosomal DNA. Phage harboring the gene were identified by plaque hybridization using 32P-labeled proline-rich protein cDNA inserts from clones pRP33 and pMP1 obtained from rat and mouse, respectively. The transcriptional unit includes three exonic sequences separated by 1434 base pairs (intron I) and 450 base pairs (intron II). The complete primary structure of the gene and the 5' and 3' flanking regions (3595 base pairs) were determined by the Maxam and Gilbert (Maxam, A.M., and Gilbert, W. (1980) Methods Enzymol. 65, 499-560) sequencing method. The DNA on the 5' side of exon I contains several sequences that may be involved in the induction and expression of this mouse gene. These sequences include putative regulatory sites such as those considered to be inducible by cAMP and steroids, Z-DNA and enhancer sequences and the expected TATAA and CAAT boxes. The mature protein coding region, exon II, is not interrupted with intron sequences. Exon III is located in the nontranslated region and contains the poly(A) addition site. The deduced amino acid sequence showed that the protein encoded by this gene contains 13 tandemly repeat regions, each 14 amino acids in length, with the prototype sequence PPPPGGPQPRPPQG. Each amino acid within the repeat has a favored codon. The consensus DNA sequence for each repeat is CCA CCA CCA CCA GGA GGC CCA CAG CCG AGA CCC CCT CAA GGC. The high degree of conservation of both nucleotide and amino acid sequences within the repeat region suggests that proline-rich protein genes likely evolved by gene duplication of a 42-base pair internal repeat.

从小鼠富含脯氨酸蛋白多基因家族中克隆出一个基因，该基因位于来自CD-1小鼠染色体DNA的（部分）Sau3A噬菌体文库的一段3.6千碱基对的EcoRI/BglII DNA片段上。通过噬菌斑杂交，使用分别从大鼠和小鼠获得的克隆pRP33和pMP1中经32P标记的富含脯氨酸蛋白cDNA插入片段，鉴定出携带该基因的噬菌体。转录单元包括由1434个碱基对（内含子I）和450个碱基对（内含子II）隔开的三个外显子序列。通过Maxam和Gilbert（Maxam, A.M., and Gilbert, W. (1980) Methods Enzymol. 65, 499 - 560）测序方法确定了该基因及其5'和3'侧翼区域（3595个碱基对）的完整一级结构。外显子I 5'侧的DNA包含几个可能参与该小鼠基因诱导和表达的序列。这些序列包括假定的调控位点，如那些被认为可被cAMP和类固醇诱导的位点、Z-DNA和增强子序列以及预期的TATA盒和CAAT盒。成熟蛋白编码区，即外显子II，没有被内含子序列打断。外显子III位于非翻译区并包含聚腺苷酸添加位点。推导的氨基酸序列表明，该基因编码的蛋白质包含13个串联重复区域，每个区域长度为14个氨基酸，其原型序列为PPPPGGPQPRPPQG。重复区域内的每个氨基酸都有一个偏好密码子。每个重复的共有DNA序列为CCACCA CCA CCA GGAGGC CCA CAG CCG AGA CCC CCT CAA GGC。重复区域内核苷酸和氨基酸序列的高度保守表明，富含脯氨酸蛋白基因可能是通过一个42碱基对内部重复序列的基因复制而进化的。

The structure and organization of a proline-rich protein gene of a mouse multigene family.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献