LaSIGE, Faculty of Sciences, University of Lisbon , Campo Grande 1749-016 Lisbon, Portugal.
J Chem Inf Model. 2013 Oct 28;53(10):2511-24. doi: 10.1021/ci400324u. Epub 2013 Oct 8.
Measuring similarity between molecules is a fundamental problem in cheminformatics. Given that similar molecules tend to have similar physical, chemical, and biological properties, the notion of molecular similarity plays an important role in the exploration of molecular data sets, query-retrieval in molecular databases, and in structure-property/activity modeling. Various methods to define structural similarity between molecules are available in the literature, but so far none has been used with consistent and reliable results for all situations. We propose a new similarity method based on atom alignment for the analysis of structural similarity between molecules. This method is based on the comparison of the bonding profiles of atoms on comparable molecules, including features that are seldom found in other structural or graph matching approaches like chirality or double bond stereoisomerism. The similarity measure is then defined on the annotated molecular graph, based on an iterative directed graph similarity procedure and optimal atom alignment between atoms using a pairwise matching algorithm. With the proposed approach the similarities detected are more intuitively understood because similar atoms in the molecules are explicitly shown. This noncontiguous atom matching structural similarity method (NAMS) was tested and compared with one of the most widely used similarity methods (fingerprint-based similarity) using three difficult data sets with different characteristics. Despite having a higher computational cost, the method performed well being able to distinguish either different or very similar hydrocarbons that were indistinguishable using a fingerprint-based approach. NAMS also verified the similarity principle using a data set of structurally similar steroids with differences in the binding affinity to the corticosteroid binding globulin receptor by showing that pairs of steroids with a high degree of similarity (>80%) tend to have smaller differences in the absolute value of binding activity. Using a highly diverse set of compounds with information about the monoamine oxidase inhibition level, the method was also able to recover a significantly higher average fraction of active compounds when the seed is active for different cutoff threshold values of similarity. Particularly, for the cutoff threshold values of 86%, 93%, and 96.5%, NAMS was able to recover a fraction of actives of 0.57, 0.63, and 0.83, respectively, while the fingerprint-based approach was able to recover a fraction of actives of 0.41, 0.40, and 0.39, respectively. NAMS is made available freely for the whole community in a simple Web based tool as well as the Python source code at http://nams.lasige.di.fc.ul.pt/.
衡量分子之间的相似性是化学生物信息学中的一个基本问题。由于相似的分子往往具有相似的物理、化学和生物特性,因此分子相似性的概念在分子数据集的探索、分子数据库中的查询检索以及结构-性质/活性建模中起着重要作用。文献中提供了各种定义分子之间结构相似性的方法,但到目前为止,还没有一种方法能够在所有情况下都得到一致和可靠的结果。我们提出了一种基于原子对齐的新的分子相似性方法,用于分析分子之间的结构相似性。该方法基于比较可比分子上原子的键合分布特征,包括在其他结构或图匹配方法中很少发现的特征,如手性或双键立体异构。然后,基于迭代有向图相似性过程和使用成对匹配算法的原子之间的最优原子对齐,在注释分子图上定义相似性度量。使用所提出的方法,由于明确显示了分子中相似的原子,因此可以更直观地理解检测到的相似性。这种非连续原子匹配结构相似性方法(NAMS)已通过三个具有不同特征的困难数据集进行了测试和与最广泛使用的相似性方法(基于指纹的相似性)进行了比较。尽管计算成本更高,但该方法表现良好,能够区分使用基于指纹的方法无法区分的不同或非常相似的烃类。NAMS 还通过显示与皮质类固醇结合球蛋白受体结合亲和力存在差异的结构相似的甾体具有高度相似性(>80%)的甾体对,验证了相似性原理,表明结合活性的绝对值差异较小。使用具有单胺氧化酶抑制水平信息的高度多样化化合物集,当种子对于不同的相似性截止阈值具有活性时,该方法还能够恢复更高的平均活性化合物分数。特别是,对于截止阈值为 86%、93%和 96.5%,NAMS 能够分别恢复 0.57、0.63 和 0.83 的活性化合物分数,而基于指纹的方法能够分别恢复 0.41、0.40 和 0.39 的活性化合物分数。NAMS 作为一个简单的基于 Web 的工具以及 Python 源代码免费提供给整个社区,网址为 http://nams.lasige.di.fc.ul.pt/。