Reimer Bryn Marie, Awoonor-Williams Ernest, Golosov Andrei A, Hornak Viktor
Computer-Aided Drug Discovery, Global Discovery Chemistry, Novartis Biomedical Research, 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.
Manning College of Information & Computer Sciences, University of Massachusetts Amherst, 140 Governors Drive, Amherst, Massachusetts 01003, United States.
J Chem Inf Model. 2025 Jan 27;65(2):544-553. doi: 10.1021/acs.jcim.4c01281. Epub 2025 Jan 8.
Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., "ligandable" cysteines). We explored myriad physicochemical features (p, solvent exposure, residue electrostatics, etc.) and protein-ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F score of 0.73 on held-out test sets. When tested on a small sample of proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.
靶向共价抑制是药物研发人员工具包中的一种强大治疗方式。共价药物研发的最新进展,尤其是针对半胱氨酸的研发,已在诸如突变型KRAS等传统上具有挑战性的靶点方面取得了重大突破,突变型KRAS与多种人类癌症相关。然而,识别用于靶向共价抑制的半胱氨酸是一项艰巨的任务,因为实验工具和计算机模拟工具的准确性有限。利用最近发布的CovPDB和CovBinderInPDB数据库,我们训练并测试了可解释的机器学习(ML)模型,以识别易于被共价修饰的半胱氨酸(即“可配体化”半胱氨酸)。我们在ML模型中探索了无数物理化学特征(pKa、溶剂暴露、残基静电等)和蛋白质-配体口袋描述符。我们最终的逻辑回归模型在保留测试集上的中位数F分数达到了0.73。在一小部分蛋白质样本上进行测试时,我们的模型也表现出合理的性能,在大多数情况下准确预测了最易配体化的半胱氨酸。综上所述,这些结果表明,我们可以准确预测用于靶向共价药物研发的潜在可配体化半胱氨酸,优先考虑更有可能具有选择性而非纯粹反应性的半胱氨酸。我们将此工具作为CovCysPredictor发布给科学界。