Park Y S, Kramer J M
University of Illinois, Department of Biological Sciences, Chicago 60680.
J Mol Biol. 1990 Jan 20;211(2):395-406. doi: 10.1016/0022-2836(90)90360-X.
Caenorhabditis elegans contains 50 to 150 collagen genes dispersed throughout its genome. We have determined the complete nucleotide sequences of two collagen genes, col-12 and col-13, that are separated by only 1800 bases and are transcribed in the same direction. The 951 nucleotides of their coding regions differ by only five nucleotides (99.5% identity). The amino acid sequences are identical except for two conservative amino acid changes within the putative secretory signal sequences, so the mature forms of the col-12 and col-13 collagens would be identical. The position and sequence of the intron (52 base-pairs) within the coding region of each gene are perfectly conserved. In contrast to the coding regions and the introns, the 5' and 3' flanking regions show little sequence similarity, col-12 and col-13 are expressed at similar levels at the same developmental stages, and appear to utilize conserved TATA boxes and transcription start sites. The major differences between the genes is that, preceding the initiator ATG, col-12 has a cis-spliced intron, while col-13 is transspliced. Thus, col-12 and col-13 are essentially identical in all aspects except that the col-12 mRNA has a 26-nucleotide cis-spliced leader at the same place where the col-13 mRNA has a 22-nucleotide trans-spliced leader. These results suggest that col-12 and col-13 are derived from a gene duplication and that sequence homology in the coding regions, but not in the flanking regions, has been maintained by gene conversion. The fact that the only significant difference between the two genes is in their modes of splicing suggests that cis and trans-splicing can be interchanged during gene evolution.
秀丽隐杆线虫含有50到150个胶原蛋白基因,这些基因分散在其整个基因组中。我们已经确定了两个胶原蛋白基因col-12和col-13的完整核苷酸序列,它们仅相隔1800个碱基,并且转录方向相同。它们编码区的951个核苷酸仅相差5个核苷酸(同一性为99.5%)。除了推定的分泌信号序列中有两个保守的氨基酸变化外,氨基酸序列是相同的,因此col-12和col-13胶原蛋白的成熟形式将是相同的。每个基因编码区内的内含子(52个碱基对)的位置和序列完全保守。与编码区和内含子形成对比的是,5'和3'侧翼区几乎没有序列相似性,col-12和col-13在相同发育阶段以相似水平表达,并且似乎利用保守的TATA盒和转录起始位点。这两个基因之间的主要差异在于,在起始密码子ATG之前,col-12有一个顺式剪接内含子,而col-13是反式剪接。因此,col-12和col-13在所有方面基本相同,只是col-12 mRNA在与col-13 mRNA具有22个核苷酸反式剪接前导序列的相同位置有一个26个核苷酸的顺式剪接前导序列。这些结果表明,col-12和col-13源自基因复制,并且编码区而非侧翼区的序列同源性通过基因转换得以维持。这两个基因之间唯一显著的差异在于它们的剪接方式这一事实表明,顺式和反式剪接在基因进化过程中可以相互转换。