Mehta Sarvesh, Laghuvarapu Siddhartha, Pathak Yashaswi, Sethi Aaftaab, Alvala Mallika, Priyakumar U Deva
Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India
Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research Hyderabad 500 037 India.
Chem Sci. 2021 Jul 26;12(35):11710-11721. doi: 10.1039/d1sc02783b. eCollection 2021 Sep 15.
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.
在药物发现应用中,通常会进行高通量虚拟筛选操作,以确定一组初始的候选分子,即所谓的“命中分子”。在这样的实验中,来自大型小分子药物库的每个分子都会根据诸如与目标受体的对接分数等物理性质进行评估。在实际的药物发现实验中,药物库非常大,但仍然只是本质上无限的化学空间的一小部分,并且对库中每个分子的物理性质进行评估在计算上是不可行的。在当前的研究中,提出了一种基于贝叶斯优化的用于增强分子筛选(MEMES)的新型机器学习框架,以对化学空间进行高效采样。所提出的框架被证明能够从大小约为1亿的分子库中识别出前1000个分子中的90%,同时仅对完整库的约6%计算对接分数。我们相信,这样的框架将极大地有助于减少不仅在药物发现中,而且在需要此类高通量实验的领域中的计算工作量。