Department of Informatics, University of Bergen, Bergen, Norway.
Comput Biol Chem. 2011 Jun;35(3):174-88. doi: 10.1016/j.compbiolchem.2011.04.008. Epub 2011 May 13.
Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
蛋白质结构比对通过两两比对常用于识别两个蛋白质之间高度相似的子结构,并基于匹配的大小和几何相似性提供结构相似性的度量。这些分数通常用于在蛋白质折叠空间的分析中应用,假设高统计显著性等同于有意义的关系,然而,由于缺乏不依赖于相同基本原理的自动化方法,这一假设的真实性以前很难测试。为了解决这个问题,我们提出了一种基于使用全局蛋白质结构的拓扑描述的方法,为评估结构比对在大规模上保持有意义的结构对应关系的能力提供了一种独立的方法。我们使用一组指定的全局折叠的大量诱饵来对三种广泛使用的结构比较方法(SAP、TM-align 和 DALI)进行基准测试,并测试这些方法对该假设的合理性程度。应用拓扑编辑距离度量来提供折叠变化程度的尺度表明,虽然高结构比对分数和低编辑距离之间存在广泛的相关性,但仍有许多对高度显著的分数差异核心链交换,因此在全局水平上结构不同。讨论了这个问题的可能原因及其对当前蛋白质折叠空间评估的意义。