Parma J, Christophe D, Pohl V, Vassart G
Institut de Recherche Interdisciplinaire, Faculté de Médecine, Université Libre de Bruxelles.
J Mol Biol. 1987 Aug 20;196(4):769-79. doi: 10.1016/0022-2836(87)90403-7.
More than one third of thyroglobulin (1190 residues out of 2750) is made of one peptide motif repeated ten times in tandem. Segments unrelated to the motif interrupt this structure at various places. The corresponding gene region, which extends over 40 x 10(3) bases, was studied in detail. All exon borders and exon/intron junctions were localized precisely and sequenced, and their positions were correlated with the repetitive organization of the protein. When intron positions were compiled on a consensus sequence of all repeats, three categories of introns were observed. Except between repeats numbers 5 and 6, an intron was invariably found within the Cys codon making the limit of each motif. This category of intron most probably reflects the serial duplication events responsible for the evolution of this region of the gene. All other introns, except no. 2, are found at positions were the repetitive structure is disrupted by "inserted" peptides. We present the hypothesis that this second category of introns was already present in the original unit before the first duplication. Thereafter, they would have experienced either complete loss (some units do not contain any intron) or partial or total exonization, resulting in the slipping of intronic material into coding sequence. Intron no. 2, finally, separates motif no. 1 at a position on the boundary between two segments presenting sequence homology. This last type of intron probably reflects an initial duplication event at the origin of a primordial thyroglobulin gene motif. With all these characteristics, the thyroglobulin gene is presented as a paradigm for the analysis of the fate of introns in gene evolution.
超过三分之一的甲状腺球蛋白(2750个残基中的1190个)由一个肽基序串联重复十次构成。与该基序无关的片段在不同位置打断了这一结构。对延伸超过40×10³个碱基的相应基因区域进行了详细研究。精确确定并测序了所有外显子边界和外显子/内含子接头,并将它们的位置与蛋白质的重复结构相关联。当根据所有重复序列的共有序列汇编内含子位置时,观察到三类内含子。除了重复序列5和6之间,在构成每个基序界限的半胱氨酸密码子内总是发现一个内含子。这类内含子很可能反映了导致该基因区域进化的串联重复事件。所有其他内含子,除了2号内含子,都位于重复结构被“插入”肽打断的位置。我们提出这样的假说,即第二类内含子在第一次重复之前就已存在于原始单元中。此后,它们可能经历了完全丢失(一些单元不包含任何内含子)或部分或完全外显子化,导致内含子物质滑入编码序列。最后,2号内含子在呈现序列同源性的两个片段之间的边界位置分隔1号基序。最后这种类型的内含子可能反映了原始甲状腺球蛋白基因基序起源时的一次初始重复事件。鉴于所有这些特征,甲状腺球蛋白基因被视为分析基因进化中内含子命运的一个范例。