激酶药物发现中基于结构信息的机器学习的交叉对接策略基准测试

Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery.

作者信息

Schaller David, Christ Clara D, Chodera John D, Volkamer Andrea

机构信息

In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.

Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.

出版信息

bioRxiv. 2023 Sep 14:2023.09.11.557138. doi: 10.1101/2023.09.11.557138.

DOI:10.1101/2023.09.11.557138

PMID:37745489

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10515787/

Abstract

In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand-utilizing shape overlap with or without maximum common substructure matching-are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies-which utilize the OpenEye Toolkit-were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.

摘要

近年来，机器学习改变了药物发现过程的许多方面，包括小分子设计，而生物活性预测是其中不可或缺的一部分。利用小分子与其蛋白质靶点之间相互作用的结构信息，对下游机器学习评分方法具有巨大潜力，但从根本上受到能否以可靠且自动化的方式预测蛋白质-配体复合物结构准确性的限制。为了找到切实可行的方法，为下游机器学习评分方法生成有用的激酶-抑制剂复合物几何结构，我们提出了一个以激酶为中心的对接基准，评估不同类别的对接和构象选择策略的性能，以评估在实际交叉对接场景中实验观察到的结合模式的重现程度。组装的基准数据集聚焦于研究充分的蛋白激酶家族，包含与423种ATP竞争性配体共结晶的589个蛋白质结构的子集。我们发现，受共结晶配体偏倚的对接方法——利用形状重叠或不利用最大公共子结构匹配——在恢复结合构象方面比单独的基于标准物理的对接更成功。此外，对接多个结构显著增加了生成低均方根偏差对接构象的机会。根据形状和静电将结合所有三种方法（Posit）对接至具有最相似共结晶配体的结构中，被证明是重现结合构象的最有效方法，在所有纳入系统中成功率达到66.9%。所研究的对接和构象选择策略——利用OpenEye工具包——已被纳入KinoML框架的流程中，可为未来下游机器学习任务自动且可靠地生成蛋白质-配体复合物。尽管聚焦于蛋白激酶，但我们相信这些一般性发现也可应用于其他蛋白质家族。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f7a0/10515787/bece8e62a47f/nihpp-2023.09.11.557138v1-f0008.jpg

相似文献

Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery.激酶药物发现中基于结构信息的机器学习的交叉对接策略基准测试

bioRxiv. 2023 Sep 14:2023.09.11.557138. doi: 10.1101/2023.09.11.557138.

Benchmarking Cross-Docking Strategies in Kinase Drug Discovery.激酶药物发现中的交叉对接策略基准测试

J Chem Inf Model. 2024 Dec 9;64(23):8848-8858. doi: 10.1021/acs.jcim.4c00905. Epub 2024 Nov 18.

Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。

J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.

Machine learning in computational docking.计算对接中的机器学习。

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.SCORCH：利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。

J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

Machine learning optimization of cross docking accuracy.交叉对接准确性的机器学习优化

Comput Biol Chem. 2016 Jun;62:133-44. doi: 10.1016/j.compbiolchem.2016.04.005. Epub 2016 May 4.

Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction.基于相似性的配体对接和结合亲和力预测的非线性评分函数。

J Chem Inf Model. 2013 Nov 25;53(11):3097-112. doi: 10.1021/ci400510e. Epub 2013 Nov 11.

RmsdXNA: RMSD prediction of nucleic acid-ligand docking poses using machine-learning method.RmsdXNA：使用机器学习方法预测核酸-配体对接构象的 RMSD。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae166.

DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures.DeepBSP：一种用于准确预测蛋白质-配体对接结构的机器学习方法。

J Chem Inf Model. 2021 May 24;61(5):2231-2240. doi: 10.1021/acs.jcim.1c00334. Epub 2021 May 12.

Open-ComBind: harnessing unlabeled data for improved binding pose prediction.Open-ComBind：利用未标记数据提高结合构象预测。

J Comput Aided Mol Des. 2023 Dec 8;38(1):3. doi: 10.1007/s10822-023-00544-y.

本文引用的文献

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences.PoseBusters：基于人工智能的对接方法无法生成符合物理原理的构象，也无法推广到新序列。

Chem Sci. 2023 Dec 13;15(9):3130-3139. doi: 10.1039/d3sc04185a. eCollection 2024 Feb 28.

On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.从蛋白质-配体结构用深度神经网络预测结合亲和力的挫折。

J Med Chem. 2022 Jun 9;65(11):7946-7958. doi: 10.1021/acs.jmedchem.2c00487. Epub 2022 May 24.

Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions.人工智能在蛋白质-配体相互作用预测中的应用：最新进展与未来方向。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab476.

The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.交叉对接构象对用于蛋白质-配体结合构象预测的机器学习分类器性能的影响。

J Cheminform. 2021 Oct 16;13(1):81. doi: 10.1186/s13321-021-00560-w.

PyRMD: A New Fully Automated AI-Powered Ligand-Based Virtual Screening Tool.PyRMD：一种全新的基于配体的全自动人工智能虚拟筛选工具。

J Chem Inf Model. 2021 Aug 23;61(8):3835-3845. doi: 10.1021/acs.jcim.1c00653. Epub 2021 Jul 16.

Kinase drug discovery 20 years after imatinib: progress and future directions.伊马替尼发现 20 年后的激酶药物研发：进展与未来方向

Nat Rev Drug Discov. 2021 Jul;20(7):551-569. doi: 10.1038/s41573-021-00195-4. Epub 2021 May 17.

Deep Learning in Virtual Screening: Recent Applications and Developments.深度学习在虚拟筛选中的应用及进展。

Int J Mol Sci. 2021 Apr 23;22(9):4435. doi: 10.3390/ijms22094435.

spyrmsd: symmetry-corrected RMSD calculations in Python.spyrmsd：Python中经对称性校正的均方根偏差计算。

J Cheminform. 2020 Aug 31;12(1):49. doi: 10.1186/s13321-020-00455-2.

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet.人工智能在药物研发中的应用：哪些是现实的，哪些是虚幻的？第 1 部分：产生影响的途径，以及我们为何尚未实现。

Drug Discov Today. 2021 Feb;26(2):511-524. doi: 10.1016/j.drudis.2020.12.009. Epub 2020 Dec 17.

KLIFS: an overhaul after the first 5 years of supporting kinase research.KLIFS：支持激酶研究的头 5 年后的全面改革。

Nucleic Acids Res. 2021 Jan 8;49(D1):D562-D569. doi: 10.1093/nar/gkaa895.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

激酶药物发现中基于结构信息的机器学习的交叉对接策略基准测试

Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献