Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität , Dahlmannstr. 2, D-53113 Bonn, Germany.
J Chem Inf Model. 2013 Sep 23;53(9):2275-81. doi: 10.1021/ci4004078. Epub 2013 Sep 6.
It is well-known that different molecular representations, e.g., graphs, numerical descriptors, fingerprints, or 3D models, change the numerical results of molecular similarity calculations. Because the assessment of structure-activity relationships (SARs) requires similarity and potency comparisons of active compounds, this representation dependence inevitably also affects SAR analysis. But to what extent? How exactly does SAR information change when alternative fingerprints are used as descriptors? What is the proportion of active compounds with substantial changes in SAR information induced by different fingerprints? To provide answers to these questions, we have quantified changes in SAR information across many different compound classes using six different fingerprints. SAR profiling was carried out on 128 target-based data sets comprising more than 60,000 compounds with high-confidence activity annotations. A numerical measure of SAR discontinuity was applied to assess SAR information on a per compound basis. For ~70% of all test compounds, changes in SAR characteristics were detected when different fingerprints were used as molecular representations. Moreover, the SAR phenotype of ~30% of the compounds changed, and distinct fingerprint-dependent local SAR environments were detected. The fingerprints we compared were found to generate SAR models that were essentially not comparable. Atom environment and pharmacophore fingerprints produced the largest differences in compound-associated SAR information. Taken together, the results of our systematic analysis reveal larger fingerprint-dependent changes in compound-associated SAR information than would have been anticipated.
众所周知,不同的分子表示形式,例如图、数值描述符、指纹或 3D 模型,会改变分子相似性计算的数值结果。由于评估结构-活性关系(SAR)需要对活性化合物进行相似性和效力比较,因此这种表示形式的依赖性不可避免地也会影响 SAR 分析。但是,这种影响到底有多大?当使用替代指纹作为描述符时,SAR 信息会发生怎样的变化?不同指纹会导致多少具有实质性 SAR 信息变化的活性化合物?为了回答这些问题,我们使用六种不同的指纹对许多不同的化合物类别的 SAR 信息进行了定量变化。在包含 60,000 多个具有高可信度活性注释的化合物的 128 个基于靶标的数据集上进行了 SAR 分析。我们应用了一种 SAR 不连续性的数值度量标准,以在每个化合物的基础上评估 SAR 信息。对于70%的所有测试化合物,当使用不同的指纹作为分子表示时,SAR 特征发生了变化。此外,30%的化合物的 SAR 表型发生了变化,并检测到了明显的指纹依赖的局部 SAR 环境。我们比较的指纹被发现生成了基本上不可比较的 SAR 模型。原子环境和药效团指纹会导致化合物相关 SAR 信息中产生最大的差异。总的来说,我们系统分析的结果揭示了化合物相关 SAR 信息中比预期更大的指纹依赖性变化。