Strehler E E, Strehler-Page M A, Perriard J C, Periasamy M, Nadal-Ginard B
J Mol Biol. 1986 Aug 5;190(3):291-317. doi: 10.1016/0022-2836(86)90003-3.
The complete nucleotide sequence and exon/intron structure of the rat embryonic skeletal muscle myosin heavy chain (MHC) gene has been determined. This gene comprises 24 X 10(3) bases of DNA and is split into 41 exons. The exons encode a 6035 nucleotide (nt) long mRNA consisting of 90 nt of 5' untranslated, 5820 nt of protein coding and 125 nt of 3' untranslated sequence. The rat embryonic MHC polypeptide is encoded by exons 3 to 41 and contains 1939 amino acid residues with a calculated Mr of 223,900. Its amino acid sequence displays the structural features typical for all sarcomeric MHCs, i.e. an amino-terminal "globular" head region and a carboxy-terminal alpha-helical rod portion that shows the characteristics of a coiled coil with a superimposed 28-residue repeat pattern interrupted at only four positions by "skip" residues. The complex structure of the rat embryonic MHC gene and the conservation of intron locations in this and other MHC genes are indicative of a highly split ancestral sarcomeric MHC gene. Introns in the rat embryonic gene interrupt the coding sequence at the boundaries separating the proteolytic subfragments of the head, but not at the head/rod junction or between the 28-residue repeats present within the rod. Therefore, there is little evidence for exon shuffling and intron-dependent evolution by gene duplication as a mechanism for the generation of the ancestral MHC gene. Rather, intron insertion into a previously non-split ancestral MHC rod gene consisting of multiple tandemly arranged 28-residue-encoding repeats, or convergent evolution of an originally non-repetitive ancestral MHC rod gene must account for the observed structure of the rod-encoding portion of present-day MHC genes.
已确定大鼠胚胎骨骼肌肌球蛋白重链(MHC)基因的完整核苷酸序列及外显子/内含子结构。该基因由24×10³个DNA碱基组成,被分割为41个外显子。这些外显子编码一个6035个核苷酸(nt)长的mRNA,其由90 nt的5'非翻译区、5820 nt的蛋白质编码区和125 nt的3'非翻译序列组成。大鼠胚胎MHC多肽由外显子3至41编码,包含1939个氨基酸残基,计算所得的分子量为223,900。其氨基酸序列展现出所有肌节MHC的典型结构特征,即一个氨基末端的“球状”头部区域和一个羧基末端的α螺旋杆状部分,该杆状部分呈现出卷曲螺旋的特征,并带有叠加的28个残基重复模式,仅在四个位置被“跳跃”残基打断。大鼠胚胎MHC基因的复杂结构以及该基因和其他MHC基因内含子位置的保守性表明,存在一个高度分割的祖先肌节MHC基因。大鼠胚胎基因中的内含子在头部蛋白水解亚片段之间的边界处打断编码序列,但不在头部/杆状交界处或杆状部分内存在的28个残基重复序列之间打断。因此,几乎没有证据表明外显子重排和通过基因复制的内含子依赖性进化是产生祖先MHC基因的机制。相反,内含子插入到一个先前未分割的由多个串联排列的28个残基编码重复序列组成的祖先MHC杆状基因中,或者一个原本非重复的祖先MHC杆状基因的趋同进化,必定可以解释当今MHC基因杆状编码部分所观察到的结构。