Escalier V, Pothier J, Soldano H, Viari A
Atelier de BioInformatique, Paris, France.
J Comput Biol. 1998 Spring;5(1):41-56. doi: 10.1089/cmb.1998.5.41.
In this paper, we present an algorithm to find three-dimensional substructures common to two or more molecules. The basic algorithm is devoted to pairwise structural comparison. Given two sets of atomic coordinates, it finds the largest subsets of atoms which are "similar" in the sense that all internal distances are approximately conserved. The basic idea of the algorithm is to recursively build subsets of increasing sizes, combining two sets of size k to build a set of size k + 1. The algorithm can be used "as is" for small molecules or local parts of proteins (about 30 atoms). When a high number of atoms is involved, we use a two step procedure. First we look for common "local" fragments by using the previous algorithm, and then we gather these fragments by using a Branch and Bound technique. We also extend the basic algorithm to perform multiple comparisons, by using one of the structures as a reference point (pivot) to which all other structures are compared. The solution is the largest subsets of atoms common to the pivot and at least q other structures. Although both algorithms are theoretically exponential in the number of atoms, experiments performed on biological data and using realistic parameters show that the solution is obtained within a few minutes. Finally, an application to the determination of the structural core of seven globins is presented.
在本文中,我们提出了一种算法,用于找出两个或多个分子共有的三维子结构。基本算法致力于成对结构比较。给定两组原子坐标,它会找到原子的最大子集,这些原子在所有内部距离大致保持不变的意义上是“相似的”。该算法的基本思想是递归地构建尺寸不断增大的子集,将两个尺寸为k的集合组合起来构建一个尺寸为k + 1的集合。该算法可直接用于小分子或蛋白质的局部区域(约30个原子)。当涉及大量原子时,我们采用两步法。首先,我们使用先前的算法寻找共同的“局部”片段,然后使用分支定界技术收集这些片段。我们还扩展了基本算法以进行多重比较,通过将其中一个结构用作参考点(枢轴),将所有其他结构与之比较。结果是枢轴与至少q个其他结构共有的原子的最大子集。尽管这两种算法在理论上随原子数量呈指数增长,但对生物数据进行的实验以及使用实际参数表明,几分钟内就能得到结果。最后,展示了该算法在确定七种球蛋白结构核心方面的应用。