Margot J B, Demers G W, Hardison R C
Department of Molecular and Cell Biology, Paul M. Althouse Laboratory, Pennsylvania State University, University Park 16802.
J Mol Biol. 1989 Jan 5;205(1):15-40. doi: 10.1016/0022-2836(89)90362-8.
The nucleotide sequence of the entire beta-like globin gene cluster of rabbits has been determined. This sequence of a continuous stretch of 44.5 x 10(3) base-pairs (bp) starts about 6 x 10(3) bp upstream from epsilon (the 5'-most gene) and ends about 12 x 10(3) bp downstream from beta (the 3'-most gene). Analysis of the sequence reveals that: (1) the sequence is relatively A + T rich (about 60%); (2) regions with high G + C content are associated with OcC repeats, a short interspersed repeated DNA in rabbits; (3) the distribution of polypurines, polypyrimidines and alternating purine/pyrimidine tracts is not random within the cluster; (4) most open reading frames are associated with known globin coding regions, OcC repeats or long interspersed repeats (L1 repeats); (5) the most prominent open reading frames are found in the L1 repeats; (6) different strand asymmetries in base composition are associated with embyronic and adult genes as well as the tandem L1 repeats at the 3' end of the cluster; and (7) essentially all the repeats appear to have been inserted by a transposon mechanism. A comparison of the sequence with itself by a dot-plot analysis has revealed nine new members of the OcC family of repeats in addition to the six previously reported. The OcC repeats tend to be clustered, particularly in the epsilon-gamma and gamma-psi delta intergenic regions. Dot-plot comparisons between the rabbit and the human clusters have revealed extensive sequence matches. Homology starts about 6 x 10(3) bp 5' to epsilon or as far upstream as the rabbit sequence is available. It continues throughout the entire cluster and stops about 0.7 x 10(3) bp 3' to beta, at which point several repeats have inserted in both rabbits and humans. Throughout the gene cluster, the homology is interrupted mainly by insertions or deletions in either the rabbit or the human genome. Almost all of the insertions are of known short or long repeated DNAs. The positions of the insertions are different in the two gene clusters, which indicates that both short and long repeats have been transposing throughout the genome for the time since the mammalian radiation. An alignment of rabbit and human sequences allows the calculation of the substitution rate around epsilon. Sequences far removed from the gene are evolving at a rate equivalent to the pseudogene rate, although some short regions show an apparently higher rate.(ABSTRACT TRUNCATED AT 400 WORDS)
已确定兔子整个类β珠蛋白基因簇的核苷酸序列。这段连续的44.5×10³个碱基对(bp)的序列从ε(最5'端的基因)上游约6×10³bp处开始,到β(最3'端的基因)下游约12×10³bp处结束。对该序列的分析表明:(1)该序列相对富含A+T(约60%);(2)高G+C含量的区域与OcC重复序列相关,OcC重复序列是兔子中一种短散在重复DNA;(3)聚嘌呤、聚嘧啶和交替嘌呤/嘧啶序列在基因簇内的分布不是随机的;(4)大多数开放阅读框与已知的珠蛋白编码区域、OcC重复序列或长散在重复序列(L1重复序列)相关;(5)最显著的开放阅读框存在于L1重复序列中;(6)碱基组成中的不同链不对称性与胚胎基因和成年基因以及基因簇3'端的串联L1重复序列相关;(7)基本上所有的重复序列似乎都是通过转座子机制插入的。通过点阵分析将该序列与其自身进行比较,除了先前报道的6个成员外,还发现了OcC重复序列家族的9个新成员。OcC重复序列倾向于聚集,特别是在ε-γ和γ-ψδ基因间隔区域。兔子和人类基因簇之间的点阵比较揭示了广泛的序列匹配。同源性从ε上游约6×10³bp处开始,或者在兔子序列可用的最上游处开始。它贯穿整个基因簇,并在β下游约0.7×10³bp处停止,此时兔子和人类中都有几个重复序列插入。在整个基因簇中,同源性主要被兔子或人类基因组中的插入或缺失打断。几乎所有的插入都是已知的短或长重复DNA。两个基因簇中插入的位置不同,这表明自哺乳动物辐射以来,短重复序列和长重复序列都一直在整个基因组中发生转座。兔子和人类序列的比对使得能够计算ε周围的替换率。远离基因的序列以与假基因速率相当的速率进化,尽管一些短区域显示出明显更高的速率。(摘要截断于400字)