主动学习和非热力学自由能在化学空间探索中的应用。
Chemical Space Exploration with Active Learning and Alchemical Free Energies.
机构信息
Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077 Göttingen, Germany.
Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium.
出版信息
J Chem Theory Comput. 2022 Oct 11;18(10):6259-6270. doi: 10.1021/acs.jctc.2c00752. Epub 2022 Sep 23.
Drug discovery can be thought of as a search for a needle in a haystack: searching through a large chemical space for the most active compounds. Computational techniques can narrow the search space for experimental follow up, but even they become unaffordable when evaluating large numbers of molecules. Therefore, machine learning (ML) strategies are being developed as computationally cheaper complementary techniques for navigating and triaging large chemical libraries. Here, we explore how an active learning protocol can be combined with first-principles based alchemical free energy calculations to identify high affinity phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure using a set of experimentally characterized PDE2 binders. The optimized protocol is then used prospectively on a large chemical library to navigate toward potent inhibitors. In the active learning cycle, at every iteration a small fraction of compounds is probed by alchemical calculations and the obtained affinities are used to train ML models. With successive rounds, high affinity binders are identified by explicitly evaluating only a small subset of compounds in a large chemical library, thus providing an efficient protocol that robustly identifies a large fraction of true positives.
药物发现可以被视为在干草堆中寻找针
在大量的化学空间中搜索最活跃的化合物。计算技术可以缩小实验后续的搜索空间,但当评估大量分子时,即使是这些技术也变得负担不起。因此,机器学习(ML)策略正在被开发为计算上更便宜的补充技术,用于导航和分类大型化学库。在这里,我们探索了如何将主动学习协议与基于第一性原理的量子化学自由能计算相结合,以鉴定高亲和力磷酸二酯酶 2(PDE2)抑制剂。我们首先使用一组经过实验表征的 PDE2 结合物对该程序进行校准。然后,该优化协议前瞻性地用于大型化学库中,以寻找有效的抑制剂。在主动学习循环中,在每次迭代中,一小部分化合物通过量子化学计算进行探测,并使用获得的亲和力来训练 ML 模型。通过连续几轮,通过明确评估大型化学库中的一小部分化合物,鉴定出高亲和力结合物,从而提供一种有效的方法,可以稳健地鉴定出大量的真正阳性结果。