Novartis Institutes for Biomedical Research, 250 Massachusetts Ave., Cambridge, Massachusetts 02139, USA.
J Chem Inf Model. 2011 Dec 27;51(12):3158-68. doi: 10.1021/ci2004994. Epub 2011 Dec 7.
From a medicinal chemistry point of view, one of the primary goals of high throughput screening (HTS) hit list assessment is the identification of chemotypes with an informative structure-activity relationship (SAR). Such chemotypes may enable optimization of the primary potency, as well as selectivity and phamacokinetic properties. A common way to prioritize them is molecular clustering of the hits. Typical clustering techniques, however, rely on a general notion of chemical similarity or standard rules of scaffold decomposition and are thus insensitive to molecular features that are enriched in biologically active compounds. This hinders SAR analysis, because compounds sharing the same pharmacophore might not end up in the same cluster and thus are not directly compared to each other by the medicinal chemist. Similarly, common chemotypes that are not related to activity may contaminate clusters, distracting from important chemical motifs. We combined molecular similarity and Bayesian models and introduce (I) a robust, activity-aware clustering approach and (II) a feature mapping method for the elucidation of distinct SAR determinants in polypharmacologic compounds. We evaluated the method on 462 dose-response assays from the Pubchem Bioassay repository. Activity-aware clustering grouped compounds sharing molecular cores that were specific for the target or pathway at hand, rather than grouping inactive scaffolds commonly found in compound series. Many of these core structures we also found in literature that discussed SARs of the respective targets. A numerical comparison of cores allowed for identification of the structural prerequisites for polypharmacology, i.e., distinct bioactive regions within a single compound, and pointed toward selectivity-conferring medchem strategies. The method presented here is generally applicable to any type of activity data and may help bridge the gap between hit list assessment and designing a medchem strategy.
从药物化学的角度来看,高通量筛选 (HTS) 命中列表评估的主要目标之一是确定具有信息结构-活性关系 (SAR) 的化学型。这些化学型可以优化主要效力、选择性和药代动力学特性。对它们进行优先级排序的一种常见方法是对命中进行分子聚类。然而,典型的聚类技术依赖于化学相似性的一般概念或标准支架分解规则,因此对生物活性化合物中丰富的分子特征不敏感。这阻碍了 SAR 分析,因为具有相同药效团的化合物可能不会最终出现在同一个聚类中,因此药物化学家无法直接相互比较。同样,与活性无关的常见化学型可能会污染聚类,从而分散对重要化学基序的注意力。我们结合了分子相似性和贝叶斯模型,并提出了 (I) 一种稳健的、基于活性的聚类方法和 (II) 一种用于阐明多效化合物中不同 SAR 决定因素的特征映射方法。我们在 Pubchem Bioassay 存储库中的 462 个剂量反应测定中评估了该方法。基于活性的聚类将具有针对手头靶标或途径的分子核心的化合物分组在一起,而不是将在化合物系列中常见的非活性支架分组在一起。我们还在讨论各自靶标 SAR 的文献中找到了其中许多核心结构。核心结构的数值比较有助于确定多效性的结构前提条件,即在单个化合物内具有不同的生物活性区域,并指出赋予选择性的药物化学策略。这里提出的方法通常适用于任何类型的活性数据,并有助于缩小命中列表评估和设计药物化学策略之间的差距。