Department of Pharmaceutical Chemistry, University of California San Francisco, 1700 4th St, Mailcode 2550, San Francisco, California 94158-2330, United States.
Taras Shevchenko National University of Kyïv, 60 Volodymyrska Street, Kyïv 01601, Ukraine.
J Chem Inf Model. 2023 Feb 27;63(4):1166-1176. doi: 10.1021/acs.jcim.2c01253. Epub 2023 Feb 15.
Purchasable chemical space has grown rapidly into the tens of billions of molecules, providing unprecedented opportunities for ligand discovery but straining the tools that might exploit these molecules at scale. We have therefore developed ZINC-22, a database of commercially accessible small molecules derived from multi-billion-scale make-on-demand libraries. The new database and tools enable analog searching in this vast new space via a facile GUI, CartBlanche, drawing on similarity methods that scale sublinearly in the number of molecules. The new library also uses data organization methods, enabling rapid lookup of molecules and their physical properties, including conformations, partial atomic charges, Log values, and solvation energies, all crucial for molecule docking, which had become slow with older database organizations in previous versions of ZINC. As the libraries have continued to grow, we have been interested in finding whether molecular diversity has suffered, for instance, because certain scaffolds have come to dominate via easy analoging. This has not occurred thus far, and chemical diversity continues to grow with database size, with a log increase in Bemis-Murcko scaffolds for every two-log unit increase in database size. Most new scaffolds come from compounds with the highest heavy atom count. Finally, we consider the implications for databases like ZINC as the libraries grow toward and beyond the trillion-molecule range. ZINC is freely available to everyone and may be accessed at cartblanche22.docking.org, via Globus, and in the Amazon AWS and Oracle OCI clouds.
可购买的化学空间已迅速扩展到数十亿个分子,为配体发现提供了前所未有的机会,但也给可能大规模利用这些分子的工具带来了压力。因此,我们开发了 ZINC-22,这是一个由数十亿规模按需制作的文库衍生的商业小分子数据库。这个新数据库和工具通过一个简单易用的 GUI,即 CartBlanche,利用在分子数量上呈次线性扩展的相似性方法,在这个巨大的新空间中实现了类似物搜索。新的文库还使用了数据组织方法,能够快速查找分子及其物理性质,包括构象、部分原子电荷、Log 值和溶剂化能,这些对于分子对接都至关重要,而在 ZINC 的早期版本中,由于数据库组织的原因,对接已经变得很慢。随着文库的不断增长,我们一直关注是否会出现分子多样性受到影响的情况,例如,由于某些骨架通过简单的模拟而占据主导地位。到目前为止,这种情况并未发生,而且随着数据库规模的增加,化学多样性仍在不断增加,Bemis-Murcko 骨架的对数增加与数据库规模的每两个对数单位增加成正比。大多数新骨架来自于具有最高重原子数的化合物。最后,我们考虑了随着文库规模增长到万亿分子范围甚至超过这个范围,像 ZINC 这样的数据库的影响。ZINC 对所有人都是免费开放的,可以在 cartblanche22.docking.org 上通过 Globus、亚马逊 AWS 和甲骨文 OCI 云访问。