Gibbs P E, Zielinski R, Boyd C, Dugaiczyk A
Biochemistry. 1987 Mar 10;26(5):1332-43. doi: 10.1021/bi00379a020.
The human alpha-fetoprotein gene spans 19,489 base pairs from the putative "Cap" site to the polyadenylation site. It is composed of 15 exons separated by 14 introns, which are symmetrically placed within the three domains of alpha-fetoprotein. In the 5' region, a putative TATAAA box is at position -21, and a variant sequence, CCAAC, of the common CAT box is at -65. Enhancer core sequences GTGGTTTAAAG are found in introns 3 and 4, and several copies of glucocorticoid response sequences AGATACAGTA are found on the template strand of the gene. There are six polymorphic sites within 4690 base pairs of contiguous DNA derived from two allelic alpha-fetoprotein genes. This amounts to a measured polymorphic frequency of 0.13%, or 6.4 X 10(-4)/site, which is about 5-10 times lower than values estimated from studies on polymorphic restriction sites in other regions of the human genome. There are four types of repetitive sequence elements in the introns and flanking regions of the human alpha-fetoprotein gene. At least one of these is apparently a novel structure (designated Xba) and is found as a pair of direct repeats, with one copy in intron 7 and the other in intron 8. It is conceivable that within the last 2 million years the copy in intron 8 gave rise to the repeat in intron 7. Their present location on both sides of exon 8 gives these sequences a potential for disrupting the functional integrity of the gene in the event of an unequal crossover between them. There are three Alu elements, one of which is in intron 4; the others are located in the 3' flanking region. A solitary Kpn repeat is found in intron 3. The Xba and Kpn repeats were only detected by complete sequencing of the introns. Neither X, Xba, nor Kpn elements are present in the related human albumin gene, whereas Alu's are present in different positions. From phylogenetic evidence, it appears that Alu elements were inserted into the alpha-fetoprotein gene at some time postdating the mammalian radiation 85 million years ago.
人类甲胎蛋白基因从假定的“帽”位点到聚腺苷酸化位点跨度为19489个碱基对。它由15个外显子和14个内含子组成,这些内含子对称地分布在甲胎蛋白的三个结构域内。在5'区域,假定的TATAAA框位于-21位,常见CAT框的变体序列CCAAC位于-65位。增强子核心序列GTGGTTTAAAG存在于内含子3和4中,并且在该基因的模板链上发现了几个糖皮质激素反应序列AGATACAGTA的拷贝。来自两个等位甲胎蛋白基因的4690个碱基对的连续DNA内有六个多态性位点。这相当于测得的多态性频率为0.13%,即6.4×10(-4)/位点,这比根据人类基因组其他区域的多态性限制性位点研究估计的值低约5-10倍。人类甲胎蛋白基因的内含子和侧翼区域有四种类型的重复序列元件。其中至少一种显然是一种新结构(命名为Xba),以一对直接重复的形式存在,一个拷贝在内含子7中,另一个在内含子8中。可以想象,在过去的200万年中,内含子8中的拷贝产生了内含子7中的重复序列。它们目前在外显子8两侧的位置使这些序列在它们之间发生不等交换时有可能破坏基因的功能完整性。有三个Alu元件,其中一个在内含子4中;其他位于3'侧翼区域。在内含子3中发现了一个单独的Kpn重复序列。Xba和Kpn重复序列仅通过内含子的完全测序检测到。相关的人类白蛋白基因中不存在X、Xba或Kpn元件,而Alu元件存在于不同位置。从系统发育证据来看,似乎Alu元件是在8500万年前哺乳动物辐射后的某个时间插入到甲胎蛋白基因中的。