Srinivasan N, Blundell T L
Department of Crystallography, Birkbeck College, University of London, UK.
Protein Eng. 1993 Jul;6(5):501-12. doi: 10.1093/protein/6.5.501.
A 3-D model of a protein can be constructed from its amino acid sequence and the 3-D structures of one or more homologues by annealing three sets of fragments: the structurally conserved regions, structurally variable regions and the side chains. The method encoded in the computer program COMPOSER was assessed by generating 3-D models of eight proteins whose crystal structures are already known and for which 3-D structures of homologues are available. In the structurally conserved regions, differences between modelled and X-ray structures are smaller than the differences between the X-ray structures of the modelled protein and the homologues used to build the model. When several homologues are used, the contributions of the known structures are weighted, preferably by the square of sequence similarity; this is especially important when the similarities of the homologues to the modelled structure differ greatly. The 'collar' extension approach, in which a similar region of different length in a homologue is used to extend the framework, can result in a more accurate model. If known homologues comprise more than one related group of proteins and they are both distantly related to the unknown, then alignment of the sequence to be modelled with each group of homologues facilitates identification of structurally conserved regions of the unknown and leads to an improved model. Models have root mean square differences (r.m.s.d.s) with the structures defined by X-ray analysis of between 0.73 and 1.56 A for all C alpha atoms, for seven the eight models. For the model of mucor pepsin, where the closest homologue has 33% sequence identity and 20% of the residues are in structurally variable regions, the r.m.s.d. for the framework region is 1.71 A and the r.m.s.d. for all C alpha atoms is 3.47 A.
蛋白质的三维模型可以通过对三组片段进行比对,从其氨基酸序列以及一个或多个同源物的三维结构构建而成:结构保守区域、结构可变区域和侧链。通过生成八个蛋白质的三维模型来评估计算机程序COMPOSER中编码的方法,这些蛋白质的晶体结构已经已知,并且有同源物的三维结构可用。在结构保守区域,模型结构与X射线结构之间的差异小于被建模蛋白质的X射线结构与用于构建模型的同源物的X射线结构之间的差异。当使用多个同源物时,已知结构的贡献会被加权,最好是通过序列相似性的平方;当同源物与被建模结构的相似性差异很大时,这一点尤为重要。“套环”延伸方法,即使用同源物中不同长度的相似区域来延伸框架,可以得到更准确的模型。如果已知同源物包含不止一组相关的蛋白质,并且它们与未知蛋白质的关系都很疏远,那么将待建模序列与每组同源物进行比对,有助于识别未知蛋白质的结构保守区域,并得到改进的模型。对于所有八个模型中的七个,模型与通过X射线分析定义的结构之间的均方根偏差(r.m.s.d.)对于所有Cα原子在0.73至1.56埃之间。对于毛霉胃蛋白酶的模型,其中最接近的同源物具有33%的序列同一性,并且20%的残基位于结构可变区域,框架区域的r.m.s.d.为1.71埃,所有Cα原子的r.m.s.d.为3.47埃。