Domingues F S, Lackner P, Andreeva A, Sippl M J
Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob Haringer Strasse 3, Salzburg, A-5020, Austria.
J Mol Biol. 2000 Apr 7;297(4):1003-13. doi: 10.1006/jmbi.2000.3615.
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.
未表征蛋白质序列的生物学作用、生化功能和结构通常是通过它们与已知蛋白质的相似性来推断的。一个持续的目标是提高比对技术的可靠性、灵敏度和准确性,以便能够检测出越来越远的关系。这些方法的开发、调整和测试受益于用于评估比对准确性的适当基准。在这里,我们描述了一种用于估计序列到序列和序列到结构比对准确性的基准协议。该协议由结构相关的蛋白质对以及评估整个集合比对准确性的程序组成。蛋白质对集合涵盖了所有目前已知的折叠类型。该基准具有挑战性,因为它由缺乏明显序列相似性的蛋白质组成。通过刚体叠加从这些对的三维结构中得出正确的目标比对。一个评估引擎根据相对于结构推导比对的比对偏移来计算从特定算法获得的比对的准确性。使用这个基准,我们估计通过氨基酸残基替换矩阵和基于知识的势能的组合可以获得最佳结果。