Miranda-Quintana Ramón Alain, Bajusz Dávid, Rácz Anita, Héberger Károly
Department of Chemistry, University of Florida, Gainesville, FL, 32603, USA.
Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.
J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.
Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .
对象相似性的量化是计算科学许多领域的关键概念。这包括化学信息学,其中分子相似性通常基于二进制指纹进行量化。虽然有多种可用的分子表示和相似性度量,但以前没有将相似性计算的计算框架扩展到同时比较两个以上对象(分子)的尝试。本研究弥补了这一差距,通过引入一个直接的计算框架来同时比较多个对象,并为尽可能多的相似性度量提供扩展公式。在二元情况下(即成对比较两个分子时),这些公式自然会简化为其众所周知的公式。我们详细分析了各种参数对扩展公式计算的相似性值的影响。扩展相似性指数完全通用,不依赖于所使用的指纹。两种方差分析(ANOVA)有助于理解这些指数的主要特征:(i)平均相似性指数的方差分析;(ii)排名差异总和(SRD)的方差分析。扩展相似性指数的实际方面和应用在随附论文中详细介绍:Miranda-Quintana等人,《化学信息学杂志》。2021年。https://doi.org/10.1186/s13321-021-00504-4 。用于计算扩展相似性度量的Python代码可在以下网址免费获取:https://github.com/ramirandaq/MultipleComparisons 。