Tufts University, Medford.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):286-93. doi: 10.1109/TCBB.2011.70. Epub 2011 Apr 1.
Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.
使用 Matt 结构对齐程序,我们在蛋白质空间中进行了一次巡回,生成了一种层次聚类方案,该方案根据几何差异将蛋白质结构域划分为簇。虽然众所周知,纯粹基于结构的、几何的、基于距离的结构相似性度量,如 Dali/FSSP,可以在很大程度上复制 SCOP 等手工策划的方案,但在家族级别,是否存在任何这样的方案可以近似 SCOP 在更远的超家族和折叠级别,这仍然是一个悬而未决的问题。我们通过设计一种基于 Matt 的聚类方案,在一定程度上对这个问题做出了肯定的回答,该方案在超家族级别上近似匹配 SCOP,并展示了 Matt 和 DaliLite 之间的性能差异。我们还讨论了这个问题对蛋白质折叠空间组织的争论的影响。基于我们对蛋白质空间的聚类,我们引入了 Mattbench 基准集,这是一组新的结构比对,有助于在更同源的蛋白质上测试序列比对器。