The Alzheimer's Research UK University College London Drug Discovery Institute, London, UK.
Department of Computer Science, University College London, London, UK.
SLAS Discov. 2021 Feb;26(2):257-262. doi: 10.1177/2472555220949495. Epub 2020 Aug 18.
Iterative screening is a process in which screening is done in batches, with each batch filled by using machine learning to select the most promising compounds from the library based on the previous results. We believe iterative screening is poised to enhance the screening process by improving hit finding while at the same time reducing the number of compounds screened. In addition, we see this process as a key enabler of next-generation high-throughput screening (HTS), which uses more complex assays that better describe the biology but demand more resource per screened compound. To demonstrate the utility of these methods, we retrospectively analyze HTS data from PubChem with a focus on machine learning-based screening strategies that can be readily implemented in practice. Our results show that over a variety of HTS experimental paradigms, an iterative screening setup that screens a total of 35% of the screening collection over as few as three iterations has a median return rate of approximately 70% of the active compounds. Increasing the portion of the library screened to 50% yields median returns of approximately 80% of actives. Using six iterations increases these return rates to 78% and 90%, respectively. The best results were achieved with machine learning models that can be run on a standard desktop. By demonstrating that the utility of iterative screening holds true even with a small number of iterations, and without requiring significant computational resources, we provide a roadmap for the practical implementation of these techniques in hit finding.
迭代筛选是一种分批进行筛选的过程,每批筛选都使用机器学习根据前一轮的结果从库中选择最有前途的化合物。我们相信,迭代筛选通过提高命中发现率同时减少筛选化合物的数量,从而增强筛选过程。此外,我们认为这个过程是下一代高通量筛选(HTS)的关键推动因素,因为它使用更复杂的测定方法,更好地描述生物学,但每个筛选化合物需要更多的资源。为了证明这些方法的实用性,我们对 PubChem 的 HTS 数据进行了回顾性分析,重点关注可以在实践中轻松实施的基于机器学习的筛选策略。我们的结果表明,在各种 HTS 实验范例中,总筛选集的 35%,通过三到三次迭代的迭代筛选设置,具有约 70%的活性化合物的中位数返回率。将库的筛选部分增加到 50%,中位数返回约 80%的活性化合物。使用六个迭代可以将这些返回率分别提高到 78%和 90%。使用可以在标准桌面计算机上运行的机器学习模型可以获得最佳结果。通过证明即使在迭代次数较少且不需要大量计算资源的情况下,迭代筛选的实用性仍然成立,我们为在命中发现中实际实施这些技术提供了路线图。