Miyazawa S, Jernigan R L
Faculty of Technology, Gunma University, Kiryu, Gunma 376, Japan and Room B-116, Bldg 12B, MSC 5677, Laboratory of Experimental and Computational Biology, DBS, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892-5677,USA.
Protein Eng. 2000 Jul;13(7):459-75. doi: 10.1093/protein/13.7.459.
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.
我们研究了先前开发的简单势函数在数据库搜索中识别蛋白质序列与结构之间兼容性的有效性。该势函数由成对接触能、用于过度密集排列的残基排斥堆积势以及二级结构的短程势组成,所有这些都是根据已知蛋白质结构中观察到的统计偏好估计的。每个势能项都经过修改,以表示球状蛋白质序列与结构之间的兼容性。序列 - 结构比对中的成对接触相互作用基于位点对对齐的概率,在平均场近似中进行评估。间隙罚分假定与每个残基位置的接触数成正比,因此间隙将更频繁地出现在蛋白质表面而非核心区域。除了最小能量比对之外,我们还使用通过按成对比对概率依次对齐位点对而进行的概率比对。结果表明,当前的能量函数和比对方法能够很好地检测出与给定序列兼容的折叠,反之,也能检测出与给定折叠兼容的序列,并且对于这两种类型的序列和结构对,产生的比对结果大多相似。仅由最可靠的位点对组成的概率比对能够产生极小的均方根偏差,而纳入不太可靠的位点对会增加偏差。此外,观察到二级结构势在此方法中对产生改进的比对结果有有益的补充作用。值得注意的是,通过这种方法能够检测到一些序列 - 结构对,它们的序列同一性仅为5 - 20%。