Ideaconsult Ltd., 4 A.Kanchev str., Sofia 1000, Bulgaria.
Curr Top Med Chem. 2012;12(18):1987-2001. doi: 10.2174/156802612804910304.
The Structure-Activity Relationships (SAR) landscape and activity cliffs concepts have their origins in medicinal chemistry and receptor-ligand interactions modelling. While intuitive, the definition of an activity cliff as a "pair of structurally similar compounds with large differences in potency" is commonly recognized as ambiguous. This paper proposes a new and efficient method for identifying activity cliffs and visualization of activity landscapes. The activity cliffs definition could be improved to reflect not the cliff steepness alone, but also the rate of the change of the steepness. The method requires explicitly setting similarity and activity difference thresholds, but provides means to explore multiple thresholds and to visualize in a single map how the thresholds affect the activity cliff identification. The identification of the activity cliffs is addressed by reformulating the problem as a statistical one, by introducing a probabilistic measure, namely, calculating the likelihood of a compound having large activity difference compared to other compounds, while being highly similar to them. The likelihood is effectively a quantification of a SAS Map with defined thresholds. Calculating the likelihood relies on four counts only, and does not require the pairwise matrix storage. This is a significant advantage, especially when processing large datasets. The method generates a list of individual compounds, ranked according to the likelihood of their involvement in the formation of activity cliffs, and goes beyond characterizing cliffs by structure pairs only. The visualisation is implemented by considering the activity plane fixed and analysing the irregularities of the similarity itself. It provides a convenient analogy to a topographic map and may help identifying the most appropriate similarity representation for each specific SAR space. The proposed method has been applied to several datasets, representing different biological activities. Finally, the method is implemented as part of an existing open source Ambit package and could be accessed via an OpenTox API compliant web service and via an interactive application, running within a modern, JavaScript enabled web browser. Combined with the functionalities already offered by the OpenTox framework, like data sharing and remote calculations, it could be a useful tool for exploring chemical landscapes online.
结构-活性关系 (SAR) 景观和活性悬崖概念起源于药物化学和受体-配体相互作用建模。虽然直观,但将活性悬崖定义为“一对结构相似但效力差异很大的化合物”通常被认为是模糊的。本文提出了一种识别活性悬崖和可视化活性景观的新方法。活性悬崖的定义可以改进,不仅反映悬崖的陡峭程度,还反映陡峭程度变化的速度。该方法需要明确设置相似度和活性差异阈值,但提供了探索多个阈值和在单个地图中可视化多个阈值如何影响活性悬崖识别的方法。通过将问题重新表述为统计问题,引入概率度量,即计算化合物与其他化合物相比具有较大活性差异的可能性,同时与它们高度相似,来解决活性悬崖的识别问题。概率实际上是具有定义阈值的 SAS 图的量化。计算概率仅依赖于四个计数,而不需要存储两两矩阵。这是一个显著的优势,尤其是在处理大型数据集时。该方法生成一个化合物列表,根据它们参与形成活性悬崖的可能性进行排序,并不仅仅通过结构对来描述悬崖。可视化通过考虑活性平面固定并分析相似性本身的不规则性来实现。它提供了一种方便的类比,类似于地形地图,并可能有助于确定每个特定 SAR 空间最适合的相似性表示。该方法已应用于多个数据集,代表不同的生物活性。最后,该方法作为现有的开源 Ambit 包的一部分实现,并可以通过符合 OpenTox API 的 Web 服务和在现代 JavaScript 启用的 Web 浏览器中运行的交互式应用程序访问。结合 OpenTox 框架已经提供的功能,如数据共享和远程计算,它可能成为在线探索化学景观的有用工具。