School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Sackler Institute of Molecular Medicine, Tel Aviv University, Tel Aviv, Israel.
Bioinformatics. 2011 Apr 1;27(7):925-32. doi: 10.1093/bioinformatics/btr044. Epub 2011 Feb 3.
The database of known protein structures (PDB) is increasing rapidly. This results in a growing need for methods that can cope with the vast amount of structural data. To analyze the accumulating data, it is important to have a fast tool for identifying similar structures and clustering them by structural resemblance. Several excellent tools have been developed for the comparison of protein structures. These usually address the task of local structure alignment, an important yet computationally intensive problem due to its complexity. It is difficult to use such tools for comparing a large number of structures to each other at a reasonable time.
Here we present GOSSIP, a novel method for a global all-against-all alignment of any set of protein structures. The method detects similarities between structures down to a certain cutoff (a parameter of the program), hence allowing it to detect similar structures at a much higher speed than local structure alignment methods. GOSSIP compares many structures in times which are several orders of magnitude faster than well-known available structure alignment servers, and it is also faster than a database scanning method. We evaluate GOSSIP both on a dataset of short structural fragments and on two large sequence-diverse structural benchmarks. Our conclusions are that for a threshold of 0.6 and above, the speed of GOSSIP is obtained with no compromise of the accuracy of the alignments or of the number of detected global similarities.
A server, as well as an executable for download, are available at http://bioinfo3d.cs.tau.ac.il/gossip/.
已知蛋白质结构数据库(PDB)正在迅速增加。这导致了对能够处理大量结构数据的方法的需求不断增长。为了分析积累的数据,拥有一种快速识别相似结构并按结构相似性对其进行聚类的工具非常重要。已经开发了几种用于比较蛋白质结构的优秀工具。这些工具通常解决局部结构比对的任务,这是一个重要但计算密集的问题,因为它的复杂性。由于其复杂性,很难使用这些工具对大量结构进行相互比较。
在这里,我们提出了 GOSSIP,这是一种用于任何一组蛋白质结构的全局全对全比对的新方法。该方法可以检测到结构之间的相似性,直到达到一定的截止值(程序的一个参数),因此它可以以比局部结构比对方法快得多的速度检测相似结构。GOSSIP 在比已知的可用结构比对服务器快几个数量级的时间内比较许多结构,并且它也比数据库扫描方法快。我们在短结构片段数据集和两个大型序列多样结构基准上评估了 GOSSIP。我们的结论是,对于阈值为 0.6 及以上,GOSSIP 的速度可以在不影响比对准确性或检测到的全局相似性数量的情况下获得。
服务器以及可下载的可执行文件可在 http://bioinfo3d.cs.tau.ac.il/gossip/ 获得。