Buehler Ye, Reymond Jean-Louis
Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland.
J Chem Inf Model. 2025 Aug 25;65(16):8405-8410. doi: 10.1021/acs.jcim.5c00334. Epub 2025 May 15.
One recurring question when choosing which molecules to select for investigation is that of molecular complexity: is there a price to pay for complexity in terms of synthesis difficulty, and does complexity have anything to do with biological properties? In the chemical space of small organic molecules enumerated from mathematical graphs in the GDBs (Generated DataBases), most compounds are too complex and challenging for synthesis despite containing only standard functional groups and ring types. For these GDB molecules, we find that an increasing fraction (MC1) or number (MC2) of non-divalent nodes in the molecular graph represent simple measures of molecular complexity, which we interpret in terms of potential synthesis difficulties. We also show that MC1 and MC2 are applicable to commercial screening compounds (ZINC), bioactive molecules (ChEMBL) and natural products (COCONUT) and compare them with previously reported measures of molecular complexity and synthetic accessibility.
在选择要进行研究的分子时,一个反复出现的问题是分子复杂性问题:就合成难度而言,复杂性是否需要付出代价,以及复杂性与生物学特性是否有关?在从GDBs(生成数据库)中的数学图枚举的小有机分子的化学空间中,尽管大多数化合物仅包含标准官能团和环类型,但它们对于合成来说过于复杂且具有挑战性。对于这些GDB分子,我们发现分子图中非二价节点的比例(MC1)或数量(MC2)增加代表了分子复杂性的简单度量,我们从潜在合成难度的角度对其进行解释。我们还表明,MC1和MC2适用于商业筛选化合物(ZINC)、生物活性分子(ChEMBL)和天然产物(COCONUT),并将它们与先前报道的分子复杂性和合成可及性度量进行比较。