Panchenko A R, Marchler-Bauer A, Bryant S H
National Center for Biotechnology Information, National Institutes of Health, Building 38A, Room 8N805, Bethesda, MD 20894, USA.
J Mol Biol. 2000 Mar 10;296(5):1319-31. doi: 10.1006/jmbi.2000.3541.
Using a benchmark set of structurally similar proteins, we conduct a series of threading experiments intended to identify a scoring function with an optimal combination of contact-potential and sequence-profile terms. The benchmark set is selected to include many medium-difficulty fold recognition targets, where sequence similarity is undetectable by BLAST but structural similarity is extensive. The contact potential is based on the log-odds of non-local contacts involving different amino acid pairs, in native as opposed to randomly compacted structures. The sequence profile term is that used in PSI-BLAST. We find that combination of these terms significantly improves the success rate of fold recognition over use of either term alone, with respect to both recognition sensitivity and the accuracy of threading models. Improvement is greatest for targets between 10 % and 20 % sequence identity and 60 % to 80 % superimposable residues, where the number of models crossing critical accuracy and significance thresholds more than doubles. We suggest that these improvements account for the successful performance of the combined scoring function at CASP3. We discuss possible explanations as to why sequence-profile and contact-potential terms appear complementary.
使用一组结构相似的蛋白质作为基准,我们进行了一系列穿线实验,旨在确定一种具有接触势和序列谱项最佳组合的评分函数。所选的基准集包含许多中等难度的折叠识别目标,在这些目标中,BLAST检测不到序列相似性,但结构相似性广泛。接触势基于天然结构而非随机压缩结构中涉及不同氨基酸对的非局部接触的对数优势。序列谱项是PSI-BLAST中使用的项。我们发现,相对于单独使用任何一个项,这些项的组合在折叠识别成功率方面,无论是识别灵敏度还是穿线模型的准确性都有显著提高。对于序列同一性在10%至20%之间且可叠加残基在60%至80%之间的目标,改进最为显著,此时超过关键准确性和显著性阈值的模型数量增加了一倍多。我们认为这些改进解释了组合评分函数在CASP3中的成功表现。我们讨论了序列谱项和接触势项为何似乎具有互补性的可能解释。