Liu Tianyun, Samudrala Ram
Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA.
Protein Eng Des Sel. 2006 Sep;19(9):431-7. doi: 10.1093/protein/gzl027. Epub 2006 Jul 14.
The key to an accurate method of protein structure prediction is the development of an effective discriminatory function. Knowledge-based discriminatory functions extract parameters from statistical analysis of experimentally determined protein structures. We assess how the quality of the protein structures used for compiling statistics affects the performance of a residue-specific all-atom probability discriminatory function (RAPDF). We find that the discriminatory power correlates with the quality of the structural dataset on which the RAPDF is parameterized in a statistically significant manner. The overrepresentation of unfavorable contacts in the low-resolution and NMR structures contributes to the major errors in the compilation of the conditional probabilities. Such errors weaken the discriminatory power of the function, especially when decoy conformations also contain considerable numbers of unfavorable contacts. This indicates that using high-resolution structural datasets after filtering out unfavorable contacts can improve the performance of knowledge-based discriminatory functions.
准确的蛋白质结构预测方法的关键在于开发有效的判别函数。基于知识的判别函数从对实验测定的蛋白质结构的统计分析中提取参数。我们评估用于编译统计数据的蛋白质结构质量如何影响残基特异性全原子概率判别函数(RAPDF)的性能。我们发现,判别能力与RAPDF在其上进行参数化的结构数据集的质量具有统计学上的显著相关性。低分辨率和核磁共振结构中不利接触的过度存在导致了条件概率编译中的主要误差。这些误差削弱了函数的判别能力,尤其是当诱饵构象也包含大量不利接触时。这表明在滤除不利接触后使用高分辨率结构数据集可以提高基于知识的判别函数的性能。