Bühlmann Sven, Reymond Jean-Louis
Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland.
Front Chem. 2020 Feb 4;8:46. doi: 10.3389/fchem.2020.00046. eCollection 2020.
The generated database GDB17 enumerates 166.4 billion molecules up to 17 atoms of C, N, O, S and halogens following simple rules of chemical stability and synthetic feasibility. However, most molecules in GDB17 are too complex to be considered for chemical synthesis. To address this limitation, we report GDBChEMBL as a subset of GDB17 featuring 10 million molecules selected according to a ChEMBL-likeness score (CLscore) calculated from the frequency of occurrence of circular substructures in ChEMBL, followed by uniform sampling across molecular size, stereocenters and heteroatoms. Compared to the previously reported subsets FDB17 and GDBMedChem selected from GDB17 by fragment-likeness, respectively, medicinal chemistry criteria, our new subset features molecules with higher synthetic accessibility and possibly bioactivity yet retains a broad and continuous coverage of chemical space typical of the entire GDB17. GDBChEMBL is accessible at http://gdb.unibe.ch for download and for browsing using an interactive chemical space map at http://faerun.gdb.tools.
生成的数据库GDB17依据化学稳定性和合成可行性的简单规则,列举了多达17个碳原子、氮原子、氧原子、硫原子及卤素原子的1664亿个分子。然而,GDB17中的大多数分子过于复杂,难以用于化学合成。为解决这一局限性,我们报告了GDBChEMBL,它作为GDB17的一个子集,包含根据从ChEMBL中环状子结构出现频率计算得出的ChEMBL相似性分数(CLscore)选择的1000万个分子,随后在分子大小、立体中心和杂原子方面进行均匀抽样。与之前分别根据片段相似性、药物化学标准从GDB17中选出的子集FDB17和GDBMedChem相比,我们的新子集具有更高的合成可及性以及可能的生物活性,同时保留了整个GDB17所特有的对化学空间的广泛且连续的覆盖范围。可通过http://gdb.unibe.ch访问GDBChEMBL进行下载,并通过http://faerun.gdb.tools上的交互式化学空间图进行浏览。