Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands.
Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands.
Nat Comput Sci. 2024 Oct;4(10):786-796. doi: 10.1038/s43588-024-00697-2. Epub 2024 Sep 27.
Deep learning is accelerating drug discovery. However, current approaches are often affected by limitations in the available data, in terms of either size or molecular diversity. Active deep learning has high potential for low-data drug discovery, as it allows iterative model improvement during the screening process. However, there are several 'known unknowns' that limit the wider adoption of active deep learning in drug discovery: (1) what the best computational strategies are for chemical space exploration, (2) how active learning holds up to traditional, non-iterative, approaches and (3) how it should be used in the low-data scenarios typical of drug discovery. To provide answers, this study simulates a low-data drug discovery scenario, and systematically analyzes six active learning strategies combined with two deep learning architectures, on three large-scale molecular libraries. We identify the most important determinants of success in low-data regimes and show that active learning can achieve up to a sixfold improvement in hit discovery when compared with traditional screening methods.
深度学习正在加速药物发现。然而,当前的方法通常受到可用数据在大小或分子多样性方面的限制。主动深度学习在低数据药物发现方面具有很高的潜力,因为它允许在筛选过程中迭代改进模型。然而,有几个“已知的未知”限制了主动深度学习在药物发现中的更广泛应用:(1)用于化学空间探索的最佳计算策略是什么,(2)主动学习如何与传统的非迭代方法保持一致,以及(3)如何在药物发现中典型的低数据场景中使用它。为了提供答案,本研究模拟了一个低数据药物发现场景,并系统地分析了六种主动学习策略与两种深度学习架构相结合,在三个大型分子库上的应用。我们确定了在低数据环境中取得成功的最重要决定因素,并表明与传统筛选方法相比,主动学习可以将命中发现提高多达六倍。