Paulsson G, Bernholm K, Wieslander L
Department of Molecular Genetics, Medical Nobel Institute, Karolinska Institutet, Stockholm, Sweden.
J Mol Evol. 1992 Sep;35(3):205-16. doi: 10.1007/BF00178596.
The four Balbiani ring (BR) genes, BR1, BR2.1, BR2.2, and BR6 in the midge Chironomus tentans constitute a gene family encoding secretory proteins with molecular weights of approximately 10(6) daltons. The major part of each gene is known to consist of tandemly organized composite repeat units resulting in a hierarchic repeat arrangement. Here, we present the sequence organization of the 5' part of the BR2.2 and BR6 genes and describe the entire transcribed part of the two genes. As the BR1 and BR2.1 genes were also fully characterized recently, this allows the comparison of all genes in the BR gene family. All four genes share the same exon-intron structure and have evolved by gene duplications starting from a common ancestor, having the same overall organization as the BR genes of today. The genes encode proteins that have an approximately 10,000-amino acid residue extended central domain, flanked by a highly charged, approximately 200-residue amino-terminal domain and a globular 110-residue carboxy-terminal domain. Exons 1-3 and the beginning of exon 4 encode the amino-terminal domain, which throughout contains many regions built from short repeats. These repeats are often degenerate as to repeat unit and sequence and are present in different numbers between the genes. In several instances these repeat structures, however, are conserved at the protein level where they form positively or negatively charged regions. Each BR gene has a 26-38-kb-long exon 4, which consists of an array of 125-150 repeat units and encodes the central domain. The number of repeat units appears to be largely preserved by selection and all repeat units in the array are very efficiently homogenized. Occasionally variant repeats have been introduced, presumably from another BR gene by gene conversion, and spread within the array. Introns 1-3 at the 5' end of the genes have diverged extensively in sequence and length between the genes. In contrast, intron 4 at the 3' end is virtually identical between three of the four genes, suggesting that gene conversion homogenizes the 3' ends of the genes, but not the 5' ends.
摇蚊(Chironomus tentans)中的四个巴尔比亚尼环(BR)基因,即BR1、BR2.1、BR2.2和BR6,构成了一个基因家族,该家族编码分子量约为10⁶道尔顿的分泌蛋白。已知每个基因的主要部分由串联排列的复合重复单元组成,形成一种层次化的重复排列。在此,我们展示了BR2.2和BR6基因5'端的序列组织,并描述了这两个基因的整个转录部分。由于BR1和BR2.1基因最近也已得到充分表征,这使得我们能够对BR基因家族中的所有基因进行比较。所有四个基因都具有相同的外显子 - 内含子结构,并且是通过从一个共同祖先开始的基因复制进化而来的,其总体结构与当今的BR基因相同。这些基因编码的蛋白质具有一个约10,000个氨基酸残基的延伸中央结构域,两侧分别是一个高度带电的、约200个残基的氨基末端结构域和一个球状的110个残基的羧基末端结构域。外显子1 - 3和外显子4的起始部分编码氨基末端结构域,该结构域中始终包含许多由短重复序列构成的区域。这些重复序列在重复单元和序列方面往往是退化的,并且在不同基因之间的数量也不同。然而,在一些情况下,这些重复结构在蛋白质水平上是保守的,它们形成带正电荷或负电荷的区域。每个BR基因都有一个26 - 38 kb长的外显子4,它由125 - 150个重复单元组成的阵列构成,并编码中央结构域。重复单元的数量似乎在很大程度上通过选择得以保留,并且阵列中的所有重复单元都非常有效地实现了同质化。偶尔会引入变异重复序列,推测是通过基因转换从另一个BR基因引入的,并在阵列中传播。基因5'端的内含子1 - 3在基因之间的序列和长度上有很大差异。相比之下,四个基因中的三个基因在3'端的内含子4实际上是相同的,这表明基因转换使基因的3'端实现了同质化,但不是5'端。