Nussinov R, Wolfson H J
Sackler Institute of Molecular Medicine, Faculty of Medicine, Tel Aviv University, Israel.
Proc Natl Acad Sci U S A. 1991 Dec 1;88(23):10495-9. doi: 10.1073/pnas.88.23.10495.
Macromolecules carrying biological information often consist of independent modules containing recurring structural motifs. Detection of a specific structural motif within a protein (or DNA) aids in elucidating the role played by the protein (DNA element) and the mechanism of its operation. The number of crystallographically known structures at high resolution is increasing very rapidly. Yet, comparison of three-dimensional structures is a laborious time-consuming procedure that typically requires a manual phase. To date, there is no fast automated procedure for structural comparisons. We present an efficient O(n3) worst case time complexity algorithm for achieving such a goal (where n is the number of atoms in the examined structure). The method is truly three-dimensional, sequence-order-independent, and thus insensitive to gaps, insertions, or deletions. This algorithm is based on the geometric hashing paradigm, which was originally developed for object recognition problems in computer vision. It introduces an indexing approach based on transformation invariant representations and is especially geared toward efficient recognition of partial structures in rigid objects belonging to large data bases. This algorithm is suitable for quick scanning of structural data bases and will detect a recurring structural motif that is a priori unknown. The algorithm uses protein (or DNA) structures, atomic labels, and their three-dimensional coordinates. Additional information pertaining to the structure speeds the comparisons. The algorithm is straightforwardly parallelizable, and several versions of it for computer vision applications have been implemented on the massively parallel connection machine. A prototype version of the algorithm has been implemented and applied to the detection of substructures in proteins.
携带生物信息的大分子通常由包含重复结构基序的独立模块组成。在蛋白质(或DNA)中检测特定的结构基序有助于阐明该蛋白质(DNA元件)所起的作用及其运作机制。高分辨率晶体学已知结构的数量正在迅速增加。然而,三维结构的比较是一个费力且耗时的过程,通常需要人工操作。迄今为止,还没有用于结构比较的快速自动化程序。我们提出了一种高效的最坏情况时间复杂度为O(n3)的算法(其中n是被检查结构中的原子数)来实现这一目标。该方法是真正的三维、与序列顺序无关的,因此对缺口、插入或缺失不敏感。此算法基于几何哈希范式,该范式最初是为计算机视觉中的目标识别问题而开发的。它引入了一种基于变换不变表示的索引方法,特别适用于高效识别属于大型数据库的刚性物体中的部分结构。该算法适用于快速扫描结构数据库,并将检测到一个事先未知的重复结构基序。该算法使用蛋白质(或DNA)结构、原子标签及其三维坐标。与结构相关的附加信息加快了比较速度。该算法易于并行化,并且已经在大规模并行连接机上实现了几个用于计算机视觉应用的版本。该算法的一个原型版本已经实现并应用于蛋白质中亚结构的检测。