Naveja J Jesús, Pilón-Jiménez B Angélica, Bajorath Jürgen, Medina-Franco José L
PECEM, School of Medicine, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
J Cheminform. 2019 Sep 24;11(1):61. doi: 10.1186/s13321-019-0380-5.
Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure-activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied "single molecule-single scaffold" correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure-property relationships, and identification of analog series and ASBS. The molecule-core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure-property relationships analyses.
化合物数据集的支架分析已重新成为一种可进行化学解释的方法,可替代机器学习用于化学空间和构效关系分析。在此背景下,基于类似物系列的支架(ASBS)是具有合成相关性的核心结构,代表单个类似物系列。作为对ASBS的扩展,我们在此介绍一种通用概念框架的开发,该框架考虑化合物数据集中分子的所有假定核心,从而弱化了通常应用的“单分子-单支架”对应关系。此处将假定核心定义为分子的任何子结构,该子结构符合两条基本规则:(a)核心的大小占整个分子大小的很大比例,且(b)该子结构可通过一系列逆合成规则从原始分子获得。此后,可针对化学结构数据库构建由分子和核心组成的二分网络。与相同核心相连的化合物被视为类似物。我们展示了案例研究以说明该通用框架的潜力。其应用范围包括化合物数据集的核心间和核心内多样性分析、结构-性质关系以及类似物系列和ASBS的识别。本文提出的分子-核心网络是一种在支架分析中有多种应用的通用方法。设想了新的统计方法,能够从这些数据中得出定量结论。使用本文所介绍方法的代码可作为附加文件免费获取。后续应用包括类似物搜索和核心结构-性质关系分析。