Rarey M, Dixon J S
German National Research Center for Information Technology (GMD), Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany.
J Comput Aided Mol Des. 1998 Sep;12(5):471-90. doi: 10.1023/a:1008068904628.
In this paper we present a new method for evaluating molecular similarity between small organic compounds. Instead of a linear representation like fingerprints, a more complex description, a feature tree, is calculated for a molecule. A feature tree represents hydrophobic fragments and functional groups of the molecule and the way these groups are linked together. Each node in the tree is labeled with a set of features representing chemical properties of the part of the molecule corresponding to the node. The comparison of feature trees is based on matching subtrees of two feature trees onto each other. Two algorithms for tackling the matching problem are described throughout this paper. On a dataset of about 1000 molecules, we demonstrate the ability of our approach to identify molecules belonging to the same class of inhibitors. With a second dataset of 58 molecules with known binding modes taken from the Brookhaven Protein Data Bank, we show that the matchings produced by our algorithms are compatible with the relative orientation of the molecules in the active site in 61% of the test cases. The average computation time for a pair comparison is about 50 ms on a current workstation.
在本文中,我们提出了一种评估小有机化合物之间分子相似性的新方法。与指纹等线性表示方式不同,我们为分子计算一种更复杂的描述——特征树。特征树表示分子的疏水片段和官能团以及这些基团连接在一起的方式。树中的每个节点都用一组特征进行标记,这些特征代表与该节点对应的分子部分的化学性质。特征树的比较基于将两个特征树的子树相互匹配。本文描述了两种解决匹配问题的算法。在一个约1000个分子的数据集上,我们展示了我们的方法识别属于同一类抑制剂的分子的能力。使用从布鲁克海文蛋白质数据库获取的58个具有已知结合模式的分子的第二个数据集,我们表明在61%的测试案例中,我们算法产生的匹配与活性位点中分子的相对取向兼容。在当前工作站上,一对比较的平均计算时间约为50毫秒。