Bashford D, Chothia C, Lesk A M
MRC Laboratory of Molecular Biology, Cambridge, England.
J Mol Biol. 1987 Jul 5;196(1):199-216. doi: 10.1016/0022-2836(87)90521-3.
The three-dimensional structures of globins are known, from crystallographic analyses, to be very similar. Their amino acid sequences, however, differ greatly. Only two residues are absolutely conserved in all sequences, and the residue identities of some pairs of sequences are only 16%. We have determined the nature and exact extent of the sequence variations and the extent to which the conserved features of the globin sequences are unique to this family. The 226 globin sequences now known were aligned and analysed. Because distantly related protein sequences cannot be aligned correctly without the use of structural data, we developed a method that incorporated structural information into the alignment procedure. Analysis of the aligned sequences show that: (1) Although individual chains vary in size between 132 and 157 residues, deletions and insertions result in there being only 102 residue sites common to all globins. These sites form six separate regions. Insertions and deletions between these regions means that their separations can vary in different sequences. (2) Within the conserved regions there are 32 sites that almost always contain hydrophobic residues. In the known structures, these sites are in the protein interior. We measured the variations in the size of the residues that occur in the 226 sequences at these sites. At six sites the residues differ in size by less than 40 A3, at 11 sites they differ by 40 to 100 A3, and at 15 sites they differ by more than 100 A3. There are two other conserved buried sites: one contains the His linked to the haem iron and the other usually contains a His involved with the haem ligand. (3) Within the conserved regions there are another 32 sites that are almost always occupied by charged, polar or small non-polar (Gly or Ala) residues. In the known structures, these sites are on the protein surface. To determine the extent to which the conserved features found for the globin sequences are unique to that protein family, the following procedure was used. The six conserved regions, and the residue restrictions that occur at the 66 sites within these regions, were encoded into two "templates". One was based only on the sequences so far determined; the other was extended to include as yet unobserved substitutions that seemed plausible on the basis of size, hydrophobicity and polarity. Each of the 3286 non-globin sequences in the data bank was then examined by a computer program to see how closely it could be matched to these templates.(ABSTRACT TRUNCATED AT 400 WORDS)
通过晶体学分析可知,珠蛋白的三维结构非常相似。然而,它们的氨基酸序列却有很大差异。在所有序列中,只有两个残基是绝对保守的,而且某些序列对之间的残基一致性仅为16%。我们已经确定了序列变异的性质和确切程度,以及珠蛋白序列的保守特征在多大程度上是该家族所特有的。我们对目前已知的226条珠蛋白序列进行了比对和分析。由于不使用结构数据就无法正确比对远缘相关的蛋白质序列,我们开发了一种将结构信息纳入比对过程的方法。对比对后的序列分析表明:(1)虽然各条链的大小在132至157个残基之间变化,但缺失和插入导致所有珠蛋白仅共有102个残基位点。这些位点形成六个独立的区域。这些区域之间的插入和缺失意味着它们在不同序列中的间距可能不同。(2)在保守区域内,有32个位点几乎总是含有疏水残基。在已知结构中,这些位点位于蛋白质内部。我们测量了226条序列中这些位点上出现的残基大小的变化。在六个位点上,残基大小差异小于40 ų,在11个位点上差异为40至100 ų,在15个位点上差异大于100 ų。还有另外两个保守的埋藏位点:一个含有与血红素铁相连的组氨酸,另一个通常含有与血红素配体有关的组氨酸。(3)在保守区域内,还有32个位点几乎总是被带电荷的、极性的或小的非极性(甘氨酸或丙氨酸)残基占据。在已知结构中,这些位点位于蛋白质表面。为了确定在珠蛋白序列中发现的保守特征在多大程度上是该蛋白质家族所特有的,采用了以下步骤。将六个保守区域以及这些区域内66个位点上出现的残基限制编码为两个“模板”。一个仅基于目前已确定的序列;另一个进行了扩展,以纳入根据大小、疏水性和极性似乎合理但尚未观察到的替代情况。然后通过计算机程序检查数据库中3286条非珠蛋白序列中的每一条,看它与这些模板的匹配程度有多高。(摘要截断于400字)