Pharmaceutical Bioinformatics, Institute of Pharmaceutical Sciences, Albert-Ludwigs-University, Hermann-Herder-Str. 9, D-79104 Freiburg, Germany.
J Chem Inf Model. 2015 May 26;55(5):915-24. doi: 10.1021/acs.jcim.5b00116. Epub 2015 Apr 30.
The screening of a reduced yet diverse and synthesizable region of the chemical space is a critical step in drug discovery. The ZINC database is nowadays routinely used to freely access and screen millions of commercially available compounds. We collected ∼125 million compounds from chemical catalogs and the ZINC database, yielding more than 68 million unique molecules, including a large portion of described natural products (NPs) and drugs. The data set was filtered using advanced medicinal chemistry rules to remove potentially toxic, promiscuous, metabolically labile, or reactive compounds. We studied the physicochemical properties of this compilation and identified millions of NP-like, fragment-like, inhibitors of protein-protein interactions (i-PPIs) like, and drug-like compounds. The related focused libraries were subjected to a detailed scaffold diversity analysis and compared to reference NPs and marketed drugs. This study revealed thousands of diverse chemotypes with distinct representations of building block combinations among the data sets. An analysis of the stereogenic and shape complexity properties of the libraries also showed that they present well-defined levels of complexity, following the tendency: i-PPIs-like < drug-like < fragment-like < NP-like. As the collected compounds have huge interest in drug discovery and particularly virtual screening and library design, we offer a freely available collection comprising over 37 million molecules under: http://pbox.pharmaceutical-bioinformatics.org , as well as the filtering rules used to build the focused libraries described herein.
筛选化学空间中经过简化但多样化且可综合的区域是药物发现的关键步骤。如今,ZINC 数据库被常规用于免费访问和筛选数以百万计的商业可用化合物。我们从化学目录和 ZINC 数据库中收集了约 1.25 亿种化合物,得到了超过 6800 万个独特的分子,其中包括很大一部分已描述的天然产物 (NPs) 和药物。该数据集使用先进的药物化学规则进行过滤,以去除潜在的有毒、混杂、代谢不稳定或反应性化合物。我们研究了该数据集的物理化学性质,并确定了数百万种类似 NP、片段样、类似蛋白质-蛋白质相互作用抑制剂 (i-PPIs) 和类药化合物。相关的重点库经过详细的支架多样性分析,并与参考 NPs 和市售药物进行了比较。这项研究揭示了数千种具有不同构建块组合代表性的多样化学型。对库的立体和形状复杂性特性的分析还表明,它们具有明确的复杂性水平,趋势为:i-PPIs 样 < 类药 < 片段样 < NP 样。由于所收集的化合物在药物发现中具有巨大的兴趣,特别是虚拟筛选和库设计,我们提供了一个免费的包含超过 3700 万个分子的集合,网址为:http://pbox.pharmaceutical-bioinformatics.org,以及用于构建本文所述重点库的过滤规则。