Poole S J, Firtel R A
J Mol Biol. 1984 Jan 15;172(2):203-20. doi: 10.1016/s0022-2836(84)80038-8.
The discoidin I genes of Dictyostelium form a small, co-ordinately regulated multigene family. We have sequenced and compared the upstream regions of the DiscI-alpha, -beta and -gamma genes. For the most part the upstream regions of the three genes are non-homologous. The upstream sequences of the beta and gamma genes are exceedingly A + T-rich, while those of the alpha gene are less so. All three genes have a relatively G + C-rich region 20 to 40 base-pairs in length, found approximately 200 base-pairs 5' to the messenger RNA start site. This G + C-rich region 5' to the beta and gamma genes is flanked by short inverted repeats. Within this region, there is an 11 base-pair exact homology between the alpha and gamma genes, and a less perfect homology between these genes and the beta gene. The homology is flanked at a short distance by interspersed G and T residues. The gamma gene is greater than 90% A + T for greater than 800 base-pairs upstream. Further upstream there is a G + C-rich region that is also found inverted approximately 3.5 X 10(3) base-pairs away. The gamma and beta genes are tandemly linked, and the entire approximately 500 base-pair intergene region between the 3' end of the gamma gene and the 5' end of the beta gene is A + T-rich (approximately 90%) with the exception of the homology region 5' to the gamma gene. We demonstrate also the presence of a discoidin I pseudogene fragment having only 139 base-pairs of discoidin homology with greater than 8% mismatch. It is flanked upstream by five 39 base-pair G + C-rich repeats, and downstream by sequences that are extremely A + T-rich. We discuss the possible significance of the conserved G + C-rich structures on discoidin I gene expression.
盘基网柄菌的盘状蛋白I基因构成了一个小的、协调调控的多基因家族。我们对盘状蛋白I-α、-β和-γ基因的上游区域进行了测序和比较。在很大程度上,这三个基因的上游区域是不同源的。β和γ基因的上游序列富含A+T,而α基因的则相对较少。所有这三个基因都有一个长度为20至40个碱基对的相对富含G+C的区域,位于信使RNA起始位点上游约200个碱基对处。β和γ基因5'端的这个富含G+C的区域两侧是短的反向重复序列。在这个区域内,α和γ基因之间有一个11个碱基对的精确同源性,这些基因与β基因之间的同源性不太完美。同源性两侧在短距离内散布着G和T残基。γ基因上游800多个碱基对的区域A+T含量大于90%。再往上有一个富含G+C的区域,在大约3.5×10(3)个碱基对处也呈反向排列。γ和β基因串联相连,γ基因3'端与β基因5'端之间整个约500个碱基对的基因间区域除γ基因5'端的同源区域外富含A+T(约90%)。我们还证明存在一个盘状蛋白I假基因片段,它与盘状蛋白只有139个碱基对的同源性,错配率大于8%。它的上游两侧是五个39个碱基对的富含G+C的重复序列,下游是极其富含A+T的序列。我们讨论了盘状蛋白I基因表达中保守的富含G+C结构的可能意义。