Computer Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Maryland 21201, United States.
J Chem Inf Model. 2024 Oct 14;64(19):7743-7757. doi: 10.1021/acs.jcim.4c01189. Epub 2024 Sep 16.
Identifying druggable binding sites on proteins is an important and challenging problem, particularly for cryptic, allosteric binding sites that may not be obvious from X-ray, cryo-EM, or predicted structures. The Site-Identification by Ligand Competitive Saturation (SILCS) method accounts for the flexibility of the target protein using all-atom molecular simulations that include various small molecule solutes in aqueous solution. During the simulations, the combination of protein flexibility and comprehensive sampling of the water and solute spatial distributions can identify buried binding pockets absent in experimentally determined structures. Previously, we reported a method for leveraging the information in the SILCS sampling to identify binding sites (termed Hotspots) of small mono- or bicyclic compounds, a subset of which coincide with known binding sites of drug-like molecules. Here, we build on that physics-based approach and present a ML model for ranking the Hotspots according to the likelihood they can accommodate drug-like molecules (e.g., molecular weight >200 Da). In the independent validation set, which includes various enzymes and receptors, our model recalls 67% and 89% of experimentally validated ligand binding sites in the top 10 and 20 ranked Hotspots, respectively. Furthermore, we show that the model's output Decision Function is a useful metric to predict binding sites and their potential druggability in new targets. Given the utility the SILCS method for ligand discovery and optimization, the tools presented represent an important advancement in the identification of orthosteric and allosteric binding sites and the discovery of drug-like molecules targeting those sites.
鉴定蛋白质上可成药的结合位点是一个重要且具有挑战性的问题,特别是对于那些隐藏的、变构的结合位点,它们可能无法从 X 射线、低温电子显微镜或预测的结构中明显看出。Site-Identification by Ligand Competitive Saturation(SILCS)方法通过使用包含水溶液中小分子溶质的全原子分子模拟来考虑靶蛋白的灵活性。在模拟过程中,蛋白质的灵活性和水及溶质空间分布的全面采样的结合可以识别实验确定结构中不存在的埋藏结合口袋。以前,我们报道了一种利用 SILCS 采样中的信息来鉴定小分子单环或双环化合物(药物样分子的一个子集)结合位点(称为热点)的方法。在这里,我们基于这种基于物理的方法,提出了一种用于根据热点容纳药物样分子(例如,分子量>200Da)的可能性对热点进行排序的 ML 模型。在独立验证集中,包括各种酶和受体,我们的模型在排名前 10 和前 20 的热点中分别召回了 67%和 89%的实验验证的配体结合位点。此外,我们表明,模型的输出决策函数是预测新靶标中结合位点及其潜在成药性的有用指标。鉴于 SILCS 方法在配体发现和优化方面的实用性,所提出的工具代表了鉴定变构和变构结合位点以及发现针对这些位点的药物样分子的重要进展。