Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012Bern, Switzerland.
J Chem Inf Model. 2023 Jan 23;63(2):484-492. doi: 10.1021/acs.jcim.2c01107. Epub 2022 Dec 19.
The generated databases (GDBs) list billions of possible molecules from systematic enumeration following simple rules of chemical stability and synthetic feasibility. To assess the originality of GDB molecules, we compared their Bemis and Murcko molecular frameworks (MFs) with those in public databases. MFs result from molecules by converting all atoms to carbons, all bonds to single bonds, and removing terminal atoms iteratively until none remain. We compared GDB-13s (99,394,177 molecules up to 13 atoms containing simplified functional groups, 22,130 MFs) with ZINC (885,905,524 screening compounds, 1,016,597 MFs), PubChem50 (100,852,694 molecules up to 50 atoms, 1,530,189 MFs), and COCONUT (401,624 natural products, 42,734 MFs). While MFs in public databases mostly contained linker bonds and six-membered rings, GDB-13s MFs had diverse ring sizes and ring systems without linker bonds. Most GDB-13s MFs were exclusive to this database, and many were relatively simple, representing attractive targets for synthetic chemistry aiming at innovative molecules.
生成的数据库(GDB)通过系统地枚举遵循化学稳定性和合成可行性的简单规则,列出了数十亿种可能的分子。为了评估 GDB 分子的新颖性,我们将它们的 Bemis 和 Murcko 分子骨架(MF)与公共数据库中的进行了比较。MF 是通过将所有原子转换为碳原子、将所有键转换为单键,并迭代地去除末端原子,直到没有原子为止,从分子中得到的。我们将 GDB-13(包含简化官能团的 13 个原子、99,394,177 个分子,22,130 个 MF)与 ZINC(885,905,524 个筛选化合物,1,016,597 个 MF)、PubChem50(100,852,694 个分子、50 个原子,1,530,189 个 MF)和 COCONUT(401,624 种天然产物,42,734 个 MF)进行了比较。虽然公共数据库中的 MF 主要包含连接键和六元环,但 GDB-13 的 MF 具有不同的环大小和没有连接键的环系统。大多数 GDB-13 的 MF 仅在此数据库中存在,而且许多 MF 相对简单,代表了合成化学旨在创新分子的有吸引力的目标。