Suppr超能文献

利用增强数据缩小机器学习评分函数与自由能微扰之间的差距。

Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.

作者信息

Valsson Ísak, Warren Matthew T, Deane Charlotte M, Magarkar Aniket, Morris Garrett M, Biggin Philip C

机构信息

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.

Structural Bioinformatics and Computational Biochemistry, Department of Biochemistry, University of Oxford, Oxford, UK.

出版信息

Commun Chem. 2025 Feb 8;8(1):41. doi: 10.1038/s42004-025-01428-y.

Abstract

Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector-protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall's τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall's τ of 0.68 and 0.49 on the FEP benchmark) while being  ~400,000 times faster.

摘要

机器学习为快速准确的结合亲和力预测带来了巨大希望。然而,当前模型缺乏稳健的评估,并且在(从命中到)先导优化中遇到的任务上表现不佳,例如对同系物系列配体的结合亲和力进行排序,从而限制了它们在药物发现中的应用。在此,我们通过首先引入一种名为AEV-PLIG(原子环境向量-蛋白质配体相互作用图)的基于注意力的新型图神经网络模型来解决这些问题。其次,我们引入了一个新的、更现实的分布外测试集,称为OOD测试。我们在这个数据集、CASF-2016以及用于自由能扰动(FEP)计算的测试集上对我们的模型进行基准测试,这不仅突出了AEV-PLIG的竞争性能,还通过基于严格物理的方法对机器学习模型进行了现实评估。此外,我们展示了利用增强数据(使用基于模板的建模或分子对接生成)如何能够显著提高结合亲和力预测的相关性以及在FEP基准上的排序(加权平均PCC和肯德尔τ从0.41和0.26提高到0.59和0.42)。这些策略共同缩小了与FEP计算的性能差距(FEP+在FEP基准上实现了加权平均PCC和肯德尔τ分别为0.68和0.49),同时速度快约400,000倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b836/11807228/98bd74ae3042/42004_2025_1428_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验