Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland.
BMC Bioinformatics. 2011 Aug 17;12:344. doi: 10.1186/1471-2105-12-344.
Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships.
We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy).
DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
蛋白质结构比较是生物信息学中最广泛的任务之一。然而,目前使用的方法在所谓的“困难相似性”方面存在问题,包括结构的相当大的移位和扭曲、序列交换和循环排列。需要有效的自动化系统来克服这些困难,这可能会导致发现以前未知的结构关系。
我们提出了一种基于蛋白质结构局部描述符形式主义的新的蛋白质结构比较方法——描述符定义对齐(DEDAL)。通过对相似描述符对识别的局部相似性,扩展到全局结构比对。我们通过比对困难基准集中的结构来证明该方法的能力:SISYPHUS 数据库中的精心整理的对齐,以及包括非序列和非刚体对齐的 SISY 和 RIPC 集。在最难的 RIPC 序列比对集中,该方法的准确率达到 77%(测试的第二好方法的准确率为 60%)。
DEDAL 足够快,可以用于整个蛋白质组学应用,通过降低可检测结构相似性的阈值,它可能会进一步揭示分子进化过程。它非常适合于改进结构域的自动分类,帮助分析蛋白质折叠空间,或改进蛋白质分类方案。DEDAL 可在 http://bioexploratorium.pl/EP/DEDAL 在线获得。