O'Hagan Steve, Kell Douglas B
School of Chemistry, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.
Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN UK.
J Cheminform. 2017 Mar 9;9:18. doi: 10.1186/s13321-017-0198-y. eCollection 2017.
In previous work, we have assessed the structural similarities between marketed drugs ('drugs') and endogenous natural human metabolites ('metabolites' or 'endogenites'), using 'fingerprint' methods in common use, and the Tanimoto and Tversky similarity metrics, finding that the fingerprint encoding used had a dramatic effect on the apparent similarities observed. By contrast, the maximal common substructure (MCS), when the means of determining it is fixed, is a means of determining similarities that is largely independent of the fingerprints, and also has a clear chemical meaning. We here explored the utility of the MCS and metrics derived therefrom. In many cases, a shared scaffold helps cluster drugs and endogenites, and gives insight into enzymes (in particular transporters) that they both share. Tanimoto and Tversky similarities based on the MCS tend to be smaller than those based on the MACCS fingerprint-type encoding, though the converse is also true for a significant fraction of the comparisons. While no single molecular descriptor can account for these differences, a machine learning-based analysis of the nature of the differences (MACCS_Tanimoto vs MCS_Tversky) shows that they are indeed deterministic, although the features that are used in the model to account for this vary greatly with each individual drug. The extent of its utility and interpretability vary with the drug of interest, implying that while MCS is neither 'better' nor 'worse' for every drug-endogenite comparison, it is sufficiently different to be of value. The overall conclusion is thus that the use of the MCS provides an additional and valuable strategy for understanding the structural basis for similarities between synthetic, marketed drugs and natural intermediary metabolites.
在之前的工作中,我们使用常用的“指纹”方法以及Tanimoto和Tversky相似性度量,评估了市售药物(“药物”)与内源性天然人类代谢物(“代谢物”或“内源性物质”)之间的结构相似性,发现所使用的指纹编码对观察到的表观相似性有显著影响。相比之下,当确定最大公共子结构(MCS)的方法固定时,它是一种确定相似性的方法,在很大程度上独立于指纹,并且具有明确的化学意义。我们在此探讨了MCS及其衍生度量的实用性。在许多情况下,共享的骨架有助于对药物和内源性物质进行聚类,并深入了解它们共同拥有的酶(特别是转运蛋白)。基于MCS的Tanimoto和Tversky相似性往往小于基于MACCS指纹类型编码的相似性,不过在相当一部分比较中情况则相反。虽然没有单一的分子描述符能够解释这些差异,但基于机器学习对差异性质(MACCS_Tanimoto与MCS_Tversky)的分析表明,它们确实是确定性的,尽管模型中用于解释此现象的特征因每种药物而异。其效用和可解释性的程度因所关注的药物而异,这意味着虽然对于每一种药物 - 内源性物质的比较,MCS既不是“更好”也不是“更差”,但它足够不同,具有价值。因此,总体结论是,使用MCS为理解合成市售药物与天然中间代谢物之间相似性的结构基础提供了一种额外且有价值的策略。