Bender Andreas, Mussa Hamse Y, Glen Robert C
Unilever Centre for Molecular Science Informatics, Department of Chemistry, Lensfield Road, University of Cambridge, Cambridge CB2 1EW, United Kingdom.
J Biomol Screen. 2005 Oct;10(7):658-66. doi: 10.1177/1087057105281048. Epub 2005 Sep 16.
A fragment-based similarity searching method, MOLPRINT 2D, was employed for virtual screening of Escherichia coli dihydrofolate reductase inhibitors. Using the original training set of 50,000 compounds, only marginal enrichment factors (between 1 and 3) could be achieved on the test library. The active structures contained in the training and test libraries represented different types of "chemistry", that is, different substructural features associated with activity. Training and test sets were pooled in a 2nd step and randomly split into training and test of equal size, with the objective of smoothing out the different chemical characteristics of both libraries. In a 10-fold cross-validation study on the new training and test sets, typically 10-fold enrichment could be found in the first 96 positions, 4-fold enrichment in the first 384 positions, and 3-fold enrichment in the first 1536 positions, corresponding to 6, 10, and 28 hits, respectively (out of a total of 307; activity defined as average residual activity of less than 80%). The conclusions are 2-fold. On one hand, the exact fragment-matching similarity searching method employed here is not capable of finding completely novel hit structures. On the other hand, this study emphasizes the requirement for a comparable distribution of chemical features of the training and test sets. MOLPRINT 2D is freely downloadable from http://www.cheminformatics.org.
一种基于片段的相似性搜索方法MOLPRINT 2D被用于虚拟筛选大肠杆菌二氢叶酸还原酶抑制剂。使用原始的50000种化合物训练集,在测试库上只能获得边际富集因子(1到3之间)。训练库和测试库中包含的活性结构代表了不同类型的“化学”,即与活性相关的不同子结构特征。第二步将训练集和测试集合并,然后随机分成大小相等的训练集和测试集,目的是消除两个库不同的化学特征。在对新的训练集和测试集进行的10倍交叉验证研究中,通常在前96个位置可发现10倍富集,在前384个位置可发现4倍富集,在前1536个位置可发现3倍富集,分别对应6、10和28个命中结果(总共307个;活性定义为平均残余活性小于80%)。结论有两方面。一方面,这里采用的确切片段匹配相似性搜索方法无法找到全新的命中结构。另一方面,本研究强调了训练集和测试集化学特征分布具有可比性的要求。MOLPRINT 2D可从http://www.cheminformatics.org免费下载。