López-Pérez Kenneth, Miranda-Quintana Ramón Alain
Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA.
bioRxiv. 2025 Mar 13:2025.03.09.642269. doi: 10.1101/2025.03.09.642269.
Activity cliffs represent an important challenge to tackle in cheminformatics and drug design. One of the most common indicators to quantify them is the SALI index. Here we expose mathematical limitations of SALI's formulation, the most evident: it is undefined in instances where the similarity between two molecules is one. We show how using a simple Taylor's series can aid this main problem, yielding a defined expression that can capture the ranking information from the original SALI. The second issue to solve is the quadratic complexity of using SALI to describe the roughness of the activity landscape of a set. Here, we propose iCliff, an indicator that can quantify the roughness in linear complexity. For this, we leverage the iSIM framework to obtain the average similarity of the set and a rearrangement to obtain the average of the squared property differences. The calculations for 30 different AC-focused databases suggest that there is a strong correlation between iCliff and the average pairwise of SALI's pairwise Taylor Series. To further explore the individual effects of removing each molecule in the activity landscape, we propose complementary iCliff. With this tool, we were able to identify the molecules that have a high number of activity cliffs with the rest of the molecules in the set.
活性悬崖是化学信息学和药物设计中需要应对的一项重要挑战。量化活性悬崖最常用的指标之一是SALI指数。在此,我们揭示了SALI公式的数学局限性,其中最明显的是:在两个分子相似度为1的情况下,它没有定义。我们展示了如何使用简单的泰勒级数来解决这一主要问题,得出一个可以捕捉原始SALI排名信息的定义表达式。要解决的第二个问题是使用SALI来描述一组化合物活性景观粗糙度的二次复杂度。在此,我们提出了iCliff,这是一种能够以线性复杂度量化粗糙度的指标。为此,我们利用iSIM框架获得该组化合物的平均相似度,并通过重新排列获得性质差异平方的平均值。对30个不同的以活性为重点的数据库进行的计算表明,iCliff与SALI成对泰勒级数的平均成对值之间存在很强的相关性。为了进一步探究在活性景观中去除每个分子的个体效应,我们提出了互补iCliff。借助这个工具,我们能够识别出与该组中其他分子存在大量活性悬崖的分子。