Rodionov M A, Johnson M S
Department of Crystallography, Birkbeck College, University of London, United Kingdom.
Protein Sci. 1994 Dec;3(12):2366-77. doi: 10.1002/pro.5560031221.
We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.
我们报告了基于对443个三维结构(按96个家族进行结构比对)的残基-残基接触矩阵分析得出的得分,这些得分可用于评估序列-结构匹配。残基-残基接触以及在每个结构家族内比对位置上这些接触对之间发生的超过3×10⁶个氨基酸替换已被列表,并根据所涉及残基的溶剂可及性进行了分类。即使序列同一性接近10%,一个结构家族内的接触图仍显示出高度保守(约75%)。在一项涉及球蛋白结构和序列数据库搜索(>21,000个序列)的比较中,接触概率得分显示为高分序列-结构匹配提供了非常强大的二级筛选,其中69%至84%的不相关匹配被排除。在序列数据库中对一组比对的2个球蛋白进行搜索,并随后基于残基接触对匹配进行评估,在第一个非球蛋白匹配之前找到了所有618个球蛋白序列。从单个细菌丝氨酸蛋白酶结构出发,结构模板方法与残基-残基接触替换数据相结合,在序列数据库搜索的顶级匹配中检测到了哺乳动物丝氨酸蛋白酶家族。