Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA.
Xemistry GmbH, Schwalbenweg 5, D-61479, Glashütten, Germany.
Sci Data. 2020 Nov 11;7(1):384. doi: 10.1038/s41597-020-00727-4.
We have made available a database of over 1 billion compounds predicted to be easily synthesizable, called Synthetically Accessible Virtual Inventory (SAVI). They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks (enamine.net). Only single-step, two-reactant syntheses were calculated for this database even though the technology can execute multi-step reactions. The possibility to incorporate scoring systems in CHMTRN allowed us to subdivide the database of 1.75 billion compounds in sets according to their predicted synthesizability, with the most-synthesizable class comprising 1.09 billion synthetic products. Properties calculated for all SAVI products show that the database should be well-suited for drug discovery. It is being made publicly available for free download from https://doi.org/10.35115/37n9-5738.
我们提供了一个超过 10 亿种化合物的数据库,这些化合物预计很容易合成,称为易于合成虚拟库存 (SAVI)。它们是通过一组基于适应和扩展描述化学合成专家知识的 CHMTRN/PATRAN 编程语言的转换创建的,这些知识最初来自 LHASA 项目。使用 chemoinformatics 工具包 CACTVS 总共对大约 150,000 种现成的构建块(enamine.net)应用了 53 种转换。尽管该技术可以执行多步反应,但只为该数据库计算了单步、双反应物合成。在 CHMTRN 中纳入评分系统的可能性使我们能够根据其预测的可合成性将包含 17.5 亿种化合物的数据库划分为多个集合,其中最可合成的类别包含 10.9 亿种合成产品。为所有 SAVI 产品计算的属性表明,该数据库非常适合药物发现。它可从 https://doi.org/10.35115/37n9-5738 免费下载。