Reid Lorne S
Allelix Biopharmaceuticals 6850 Goreway Drive Mississauga, Ontario Canada , L4V 1P1.
J Res Natl Inst Stand Technol. 1989 Jan-Feb;94(1):65-78. doi: 10.6028/jres.094.009.
The procedures used to model a protein structure are well established when the novel protein has high sequence similarity to a protein of known structure. Many proteins of interest have low (i.e. <50%) sequence similarity to any known structure. In these cases new approaches to prediction of structure are required. The use of sequence profiles which relate sequence to known structure has been proposed as one method to assign local regions of structure. As a first stage, templates or "icons" of the many relevant substructural motifs found in proteins must be defined. The sequences which gave rise to these structures are then aligned and a weighted profile obtained. Average structures of the 8 and 12 residue helix-turn and turn-helix motifs have been prepared. These coordinate templates were then used to scan through the Brookhaven protein structural database for similar, superimposable fragments. A composite template of 100 similar fragments for each element was found to be internally consistent to a rmsd=0.92 Å for HT8, 1.54 Å for HT12, 0.41 Å for TH8 and 1.40 Å for TH12. All of the sequences, from these structures, were then used to create an overall sequence profile. The four sequence profiles were scanned against the amino acid sequences of the proteins in the Brookhaven database: tertiary structure was correctly identified only about 10% of the time. This value is too low for predictive purposes. However, it could be increased by checking for multiple occurrences of the template in one protein.
当新蛋白质与已知结构的蛋白质具有高度序列相似性时,用于构建蛋白质结构模型的程序已得到充分确立。许多感兴趣的蛋白质与任何已知结构的序列相似性都很低(即<50%)。在这些情况下,需要新的结构预测方法。有人提出使用将序列与已知结构相关联的序列概况作为一种确定局部结构区域的方法。作为第一步,必须定义蛋白质中发现的许多相关亚结构基序的模板或“图标”。然后将产生这些结构的序列进行比对并获得加权概况。已经制备了8残基和12残基螺旋-转角和转角-螺旋基序的平均结构。然后使用这些坐标模板在布鲁克海文蛋白质结构数据库中搜索相似的、可叠加的片段。发现每个元件的100个相似片段的复合模板在内部是一致的,对于HT8,均方根偏差(rmsd)为0.92 Å,对于HT12为1.54 Å,对于TH8为0.41 Å,对于TH12为1.40 Å。然后使用来自这些结构的所有序列来创建总体序列概况。将这四个序列概况与布鲁克海文数据库中蛋白质的氨基酸序列进行比对:仅约10%的时间能正确识别三级结构。这个值对于预测目的来说太低了。然而,可以通过检查模板在一种蛋白质中的多次出现来提高该值。