Defay T R, Cohen F E
Graduate Group in Biophysics, University of California, San Francisco 94143-0450, USA.
J Mol Biol. 1996 Sep 20;262(2):314-23. doi: 10.1006/jmbi.1996.0515.
Threading algorithms attempt to solve the inverse protein folding problem: given a group of structures and a sequence, identify the structure that is most compatible with this sequence. A recent study of this class of algorithms by S. J. Wodak and colleagues suggests that while threading algorithms are capable of recognizing many folding motifs, their performance in truly blind predictions is disappointing, and the underlying alignments upon which the selections are based are frequently errant. To help overcome this problem we have developed a Test of Optimal Mutagenesis algorithm (TOM) that exploits information inherent in the variation between several homologues in a multiple sequence alignment. This information is used to help select the correct structural motif for the sequence from a database of known structures. A total of 305 high-resolution structures were selected to represent the set of known folds; 56 proteins were chosen that had at least one close structural match in this set. To test TOM, we attempted to determine which of the 305 folds was a match to each of the 56 protein sequences. TOM correctly predicts a close structural match for 45% of these proteins. THREADER, an algorithm chosen as a literature standard, correctly matched 20% of the test set. By comparing the performance of TOM, THREADER, and TOM NOVAR (a version of TOM without variability information), we conclude that the tendency of an amino acid to be buried or exposed is the dominant determinant of the success of threading algorithms. In addition, the structural alignments produced by TOM suggest that the exact alignment of just 30 to 50% of the residues in a sequence with the correct fold is necessary to select it as the highest scoring match in a set of folds.
给定一组结构和一个序列,识别与该序列最匹配的结构。S. J. 沃达克及其同事最近对这类算法进行的一项研究表明,虽然穿线法算法能够识别许多折叠基序,但它们在真正的盲预测中的表现令人失望,而且作为选择基础的潜在比对常常是错误的。为了帮助克服这个问题,我们开发了一种最优诱变测试算法(TOM),该算法利用了多序列比对中几个同源物之间变异所固有的信息。此信息用于帮助从已知结构数据库中为该序列选择正确的结构基序。总共选择了305个高分辨率结构来代表已知折叠集;选择了56种在该集合中至少有一个紧密结构匹配的蛋白质。为了测试TOM,我们试图确定305种折叠中哪一种与56种蛋白质序列中的每一种匹配。TOM正确地为其中45%的蛋白质预测了紧密的结构匹配。THREADER是一种被选为文献标准的算法,它正确匹配了20%的测试集。通过比较TOM、THREADER和TOM NOVAR(一种没有变异信息的TOM版本)的性能,我们得出结论,氨基酸被埋藏或暴露的趋势是穿线法算法成功的主要决定因素。此外,TOM产生的结构比对表明,序列中仅30%至50%的残基与正确折叠的精确比对对于将其选为一组折叠中得分最高的匹配是必要的。