Konagurthu Arun S, Stuckey Peter J, Lesk Arthur M
Department of Biochemistry and Molecular Biology and The Huck Institute for Genomics, Proteomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA.
Bioinformatics. 2008 Mar 1;24(5):645-51. doi: 10.1093/bioinformatics/btm641. Epub 2008 Jan 5.
Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly. We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins.
A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch.
从蛋白质结构数据库中比较和分类折叠模式对于理解蛋白质结构、进化和功能的原理至关重要。当前用于搜索具有相似折叠模式蛋白质的方法速度慢且计算量大。已知蛋白质结构数量的急剧增长对结构比较方法提出了严峻挑战。需要能够准确快速搜索结构数据库的方法。我们提供了几种方法,使用一种简洁的表格表示法来搜索相似的折叠模式,该表示法编码了二级结构元件的相对几何形状。我们的第一种方法允许在恒定时间内(每次命中)提取相同和非常密切相关的蛋白质折叠模式。接下来,我们解决了比较两个表格时提取最大相似子表格这一艰巨的计算问题。我们使用二次和线性整数规划公式解决了该问题,并展示了它们识别细微结构相似性的能力,特别是当蛋白质结构有显著差异时。最后,我们描述了一种快速准确的方法,用于将查询结构与蛋白质结构域数据库进行比较,即表格搜索法。表格搜索法速度足够快,能够在标准台式计算机上数秒内搜索整个结构数据库。我们对表格搜索法在许多查询上的分析表明,该方法在识别折叠模式的相似性方面非常准确,即使是在远缘相关的蛋白质之间。