Department of Life Science, Imperial College London, London, SW7 2AZ United Kingdom.
J Phys Chem B. 2012 Jun 14;116(23):6732-9. doi: 10.1021/jp212084f. Epub 2012 Mar 28.
The Investigational Novel Drug Discovery by Example (INDDEx) package has been developed to find active compounds by linking activity to chemical substructure and to guide the process of further drug development. INDDEx is a machine-learning technique, based on forming qualitative logical rules about substructural features of active molecules, weighting the rules to form a quantitative model, and then using the model to screen a molecular database. INDDEx is shown to be able to learn from multiple active compounds and to be useful for scaffold-hopping when performing virtual screening, giving high retrieval rates even when learning from a small number of compounds. Across the data sets tested, at 1% of the data, INDDEx was found to have average enrichment factors of 69.2, 82.7, and 90.4 when learning from 2, 4, and 8 active ligands, respectively. At 0.1% of the data, INDDEx had average enrichment factors of 492, 631, and 707 when learning from 2, 4, and 8 active ligands, respectively. Excluding all ligands with more than 0.5 Tanimoto Maximum Common Substructure, INDDEx had average enrichment factors at 1% of 52.3, 63.6, and 66.9 when learning from 2, 4, and 8 active ligands, respectively. The performance of INDDEx is compared with that of eHiTS LASSO, PharmaGist, and DOCK.
研究性新药发现示例(INDDEx)包旨在通过将活性与化学结构基元相关联,指导进一步的药物开发过程,从而找到活性化合物。INDDEx 是一种机器学习技术,它基于形成关于活性分子的结构基元的定性逻辑规则,对规则进行加权以形成定量模型,然后使用该模型筛选分子数据库。结果表明,INDDEx 能够从多个活性化合物中学习,并在进行虚拟筛选时用于支架跳跃,即使从少量化合物中学习也能获得较高的检索率。在所测试的数据集上,在数据的 1%时,当从 2、4 和 8 个活性配体分别学习时,INDDEx 的平均富集因子分别为 69.2、82.7 和 90.4。在数据的 0.1%时,当从 2、4 和 8 个活性配体分别学习时,INDDEx 的平均富集因子分别为 492、631 和 707。当排除所有 Tanimoto 最大公共结构元大于 0.5 的配体时,当从 2、4 和 8 个活性配体分别学习时,INDDEx 的平均富集因子分别为 52.3、63.6 和 66.9。将 INDDEx 的性能与 eHiTS LASSO、PharmaGist 和 DOCK 进行了比较。