van Hoorn Willem P, Bell Andrew S
Department of Chemistry, Pfizer Global Research and Development, Sandwich Laboratories, Sandwich, Kent CT13 9NJ, United Kingdom.
J Chem Inf Model. 2009 Oct;49(10):2211-20. doi: 10.1021/ci900072g.
The Pfizer Global Virtual Library (PGVL) is defined as a set compounds that could be synthesized using validated protocols and monomers. However, it is too large (10(12) compounds) to search by brute-force methods for close analogues of a given input structure. In this paper the Bayesian Idea Generator is described which is based on a novel application of Bayesian statistics to narrow down the search space to a prioritized set of existing library arrays (the default is 16). For each of these libraries the 6 closest neighbors are retrieved from the existing compound file, resulting in a screenable hypothesis of 96 compounds. Using the Bayesian models for library space, the Pfizer file of singleton compounds has been mapped to library space and is optionally searched as well. The method is >99% accurate in retrieving known library provenance from an independent test set. The compounds retrieved strike a balance between similarity and diversity resulting in frequent scaffold hops. Four examples of how the Bayesian Idea Generator has been successfully used in drug discovery are provided. The methodology of the Bayesian Idea Generator can be used for any collection of compounds containing distinct clusters, and an example using compound vendor catalogues has been included.
辉瑞全球虚拟文库(PGVL)被定义为一组可使用经过验证的方案和单体合成的化合物。然而,其规模太大(有10的12次方种化合物),无法通过暴力方法搜索给定输入结构的近似类似物。本文描述了贝叶斯概念生成器,它基于贝叶斯统计的一种新应用,将搜索空间缩小到一组经过优先级排序的现有文库阵列(默认值为16个)。对于这些文库中的每一个,从现有化合物文件中检索出6个最接近的邻居,从而产生一个可筛选的包含96种化合物的假设。利用文库空间的贝叶斯模型,辉瑞单化合物文件已被映射到文库空间,并且也可进行选择性搜索。该方法从一个独立测试集中检索已知文库来源的准确率超过99%。检索到的化合物在相似性和多样性之间取得平衡,从而经常实现骨架跃迁。文中提供了四个贝叶斯概念生成器在药物发现中成功应用的例子。贝叶斯概念生成器的方法可用于任何包含不同簇的化合物集合,并且包含了一个使用化合物供应商目录的例子。