Foster D C, Yoshitake S, Davie E W
Proc Natl Acad Sci U S A. 1985 Jul;82(14):4673-7. doi: 10.1073/pnas.82.14.4673.
A human genomic DNA library was screened for the gene for protein C by using a cDNA probe coding for the human protein. Three different overlapping lambda Charon 4A phage were isolated that contain inserts for the gene for protein C. The complete sequence of the gene was determined by the dideoxy method and shown to span about 11 kilobases of DNA. The coding and 3' noncoding portion of the gene consists of eight exons and seven introns. The eight exons code for a preproleader sequence of 42 amino acids, a light chain of 155 amino acids, a connecting dipeptide of Lys-Arg, and a heavy chain of 262 amino acids. The preproleader sequence and the connecting dipeptide are removed during processing, resulting in the mature protein composed of a heavy and a light chain held together by a disulfide bond. The heavy chain also contains the catalytic region for the serine protease. Two Alu sequences and two homologous repeats of about 160 nucleotides were found in intron E. The seven introns in the gene for protein C are located in essentially the same positions in the amino acid sequence as the seven introns in the gene for human factor IX, while the first three introns in protein C are located in the same positions as the first three in the gene for human prothrombin.
利用编码人蛋白C的cDNA探针筛选人基因组DNA文库,分离出三种不同的重叠λ噬菌体Charon 4A,它们含有蛋白C基因的插入片段。采用双脱氧法测定了该基因的完整序列,结果表明该基因跨度约为11千碱基对的DNA。该基因的编码区和3'非编码区由8个外显子和7个内含子组成。这8个外显子编码一个由42个氨基酸组成的前原导序列、一个由155个氨基酸组成的轻链、一个由赖氨酸-精氨酸组成的连接二肽以及一个由262个氨基酸组成的重链。在加工过程中,前原导序列和连接二肽被去除,产生由通过二硫键连接在一起的重链和轻链组成的成熟蛋白。重链还包含丝氨酸蛋白酶的催化区域。在内含子E中发现了两个Alu序列和两个约160个核苷酸的同源重复序列。蛋白C基因中的7个内含子在氨基酸序列中的位置与人类因子IX基因中的7个内含子基本相同,而蛋白C基因的前三个内含子与人类凝血酶原基因的前三个内含子位于相同位置。