Burke W D, Eickbush T H
J Mol Biol. 1986 Aug 5;190(3):343-56. doi: 10.1016/0022-2836(86)90006-9.
The 140 X 10(3) base late chorion locus of Bombyx mori contains two 15-member multigene families arranged in tightly linked pairs, which are divergently transcribed (the high-cysteine A (HcA) and the high-cysteine B (HcB) families). Previous DNA hybridization experiments have indicated that all members of these gene families contain a complex pattern of shared sequence variation. The sequence analysis in this paper involving all 15 gene pairs allows a comprehensive examination of the nature of this variation. Average sequence homology between gene pairs is: 95% for the protein-encoding regions; 93% for the common 272 base-pair 5' flanking region; 87% for the introns; and 88% for the 3' untranslated regions. Considering the great degree of sequence homology in the coding regions, an unexpectedly high level of variation is found in the deduced protein sequences. Over 50% of the nucleotide substitutions in the protein-encoding regions lead to amino acid replacements, most of which involve a change in charge or effect the secondary structure of the protein. In addition, significant differences in length between the proteins occur in the carboxyl-terminal arm. In both families, the major portion of this arm is composed of Cys-Gly-Gly and Cys-Gly subrepeats forming a (Cys-Gly-Gly)2-(Cys-Gly)2 major repeat. Differences in the number of complete and partial repeats results in deduced protein sequences that contain arms varying from 32 to 54 amino acid residues for members of the HcA family and 14 to 88 residues for the HcB family. The high level of variation in protein composition indicates a lack of strong selective pressure. We suggest the high level of DNA sequence homology maintained by these genes in the coding as well as in the non-coding regions is the result of sequence exchange between family members.
家蚕140×10³碱基对的晚期绒毛膜基因座包含两个由紧密连锁的15个成员组成的多基因家族,它们以反向方式转录(高半胱氨酸A(HcA)和高半胱氨酸B(HcB)家族)。先前的DNA杂交实验表明,这些基因家族的所有成员都包含共享序列变异的复杂模式。本文对所有15个基因对进行的序列分析,使得能够全面研究这种变异的本质。基因对之间的平均序列同源性为:蛋白质编码区95%;共同的272个碱基对的5'侧翼区93%;内含子87%;3'非翻译区88%。考虑到编码区高度的序列同源性,在推导的蛋白质序列中发现了出乎意料的高水平变异。蛋白质编码区超过50%的核苷酸替换导致氨基酸替换,其中大多数涉及电荷变化或影响蛋白质的二级结构。此外,蛋白质在羧基末端臂的长度上存在显著差异。在这两个家族中,该臂的主要部分由Cys-Gly-Gly和Cys-Gly亚重复序列组成,形成一个(Cys-Gly-Gly)2-(Cys-Gly)2主要重复序列。完整和部分重复序列数量的差异导致推导的蛋白质序列中,HcA家族成员的臂含有32至54个氨基酸残基,HcB家族成员的臂含有14至88个残基。蛋白质组成的高水平变异表明缺乏强大的选择压力。我们认为,这些基因在编码区和非编码区保持的高水平DNA序列同源性是家族成员之间序列交换的结果。