Johnson M S, Overington J P, Blundell T L
Department of Crystallography, Birkbeck College, University of London, U.K.
J Mol Biol. 1993 Jun 5;231(3):735-52. doi: 10.1006/jmbi.1993.1323.
We introduce an approach to protein comparisons in which tertiary-structure information is exploited in the alignment of a protein sequence of known tertiary structure, or an aligned set of sequences of known homologous structures, with one or more sequences. The local tertiary environments of residues in the one or more three-dimensional structures (defined in terms of residue accessibility to solvent, secondary structure and hydrogen bonding) are used to select position-specific amino acid substitution scores and produce a scoring template suitable for aligning sequences or searching sequence data banks. The amino acid substitution scores have been accumulated from 72 families of protein structures in which the observed substitutions have been classified according to features of the local structure. Hence, the value attributed to a particular amino acid interchange in the template is not a constant, but is dependent upon the environmental context in which that substitution has occurred. We have used these structural templates to align proteins, as well as to search an amino acid sequence data bank for proteins having a similar fold. Indeed, a database of templates that corresponds to both unique structures and aligned homologous structures from the Brookhaven Protein Data Bank has been produced. A new sequence can be searched against the database of templates in order to identify a similar tertiary fold even if the sequence is not significantly similar to any proteins of known three-dimensional structure.
我们介绍一种蛋白质比较方法,该方法利用三级结构信息,将已知三级结构的蛋白质序列或一组已知同源结构的比对序列与一个或多个序列进行比对。一个或多个三维结构中残基的局部三级环境(根据残基对溶剂的可及性、二级结构和氢键来定义)用于选择位置特异性氨基酸替代得分,并生成适合比对序列或搜索序列数据库的评分模板。氨基酸替代得分是从72个蛋白质结构家族中积累而来的,其中观察到的替代根据局部结构特征进行了分类。因此,模板中赋予特定氨基酸互换的值不是一个常数,而是取决于发生该替代的环境背景。我们已使用这些结构模板来比对蛋白质,以及在氨基酸序列数据库中搜索具有相似折叠的蛋白质。实际上,已经建立了一个与布鲁克海文蛋白质数据库中的独特结构和比对同源结构相对应的模板数据库。即使新序列与任何已知三维结构的蛋白质没有显著相似性,也可以针对模板数据库搜索该序列,以识别相似的三级折叠。