数据集扩充使得基于深度学习的虚拟筛选能够更好地推广到未见的靶标类别，并突出重要的结合相互作用。

Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions.

机构信息

Department of Statistics, University of Oxford, 24-29 St Giles, Oxford OX1 3LB, U.K.

BenevolentAI, 4-8 Maple Street, London W1T 5HD, U.K.

出版信息

J Chem Inf Model. 2020 Aug 24;60(8):3722-3730. doi: 10.1021/acs.jcim.0c00263. Epub 2020 Aug 4.

DOI:10.1021/acs.jcim.0c00263

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7611237/

Abstract

Current deep learning methods for structure-based virtual screening take the structures of both the protein and the ligand as input but make little or no use of the protein structure when predicting ligand binding. Here, we show how a relatively simple method of data set augmentation forces such deep learning methods to take into account information from the protein. Models trained in this way are more generalizable (make better predictions on protein/ligand complexes from a different distribution to the training data). They also assign more meaningful importance to the protein and ligand atoms involved in binding. Overall, our results show that data set augmentation can help deep learning-based virtual screening to learn physical interactions rather than data set biases.

摘要

目前基于结构的虚拟筛选的深度学习方法将蛋白质和配体的结构都作为输入，但在预测配体结合时很少或根本不利用蛋白质结构。在这里，我们展示了一种相对简单的数据增强方法，迫使这些深度学习方法考虑来自蛋白质的信息。以这种方式训练的模型具有更好的通用性（在来自与训练数据不同分布的蛋白质/配体复合物上做出更好的预测）。它们还赋予了在结合中涉及的蛋白质和配体原子更有意义的重要性。总的来说，我们的结果表明，数据集增强可以帮助基于深度学习的虚拟筛选学习物理相互作用，而不是数据集偏差。

相似文献

1

Data Set Augmentation Allows Deep Learning-Based Virtual Screening to Better Generalize to Unseen Target Classes and Highlight Important Binding Interactions.数据集扩充使得基于深度学习的虚拟筛选能够更好地推广到未见的靶标类别，并突出重要的结合相互作用。

J Chem Inf Model. 2020 Aug 24;60(8):3722-3730. doi: 10.1021/acs.jcim.0c00263. Epub 2020 Aug 4.

2

A New Hybrid Neural Network Deep Learning Method for Protein-Ligand Binding Affinity Prediction and De Novo Drug Design.一种用于蛋白质-配体结合亲和力预测和从头药物设计的新型混合神经网络深度学习方法。

Int J Mol Sci. 2022 Nov 11;23(22):13912. doi: 10.3390/ijms232213912.

3

InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions.InteractionGraphNet：一种新颖高效的深度图表示学习框架，用于准确预测蛋白质-配体相互作用。

J Med Chem. 2021 Dec 23;64(24):18209-18232. doi: 10.1021/acs.jmedchem.1c01830. Epub 2021 Dec 8.

4

Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria.深度学习驱动的细菌外膜蛋白生物发生中超蛋白复合物的研究进展

Elife. 2022 Dec 28;11:e82885. doi: 10.7554/eLife.82885.

5

Interactive contouring through contextual deep learning.基于上下文的深度学习的交互式勾画。

Med Phys. 2021 Jun;48(6):2951-2959. doi: 10.1002/mp.14852. Epub 2021 May 3.

6

WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest.WDL-RF：通过结合加权深度学习和随机森林预测与 G 蛋白偶联受体相互作用的配体分子的生物活性。

Bioinformatics. 2018 Jul 1;34(13):2271-2282. doi: 10.1093/bioinformatics/bty070.

7

Combining Docking Pose Rank and Structure with Deep Learning Improves Protein-Ligand Binding Mode Prediction over a Baseline Docking Approach.结合对接构象排序和深度学习可提高基于对接方法的蛋白-配体结合模式预测。

J Chem Inf Model. 2020 Sep 28;60(9):4170-4179. doi: 10.1021/acs.jcim.9b00927. Epub 2020 Mar 3.

8

DeepBindBC: A practical deep learning method for identifying native-like protein-ligand complexes in virtual screening.DeepBindBC：一种实用的深度学习方法，用于在虚拟筛选中识别天然样蛋白-配体复合物。

Methods. 2022 Sep;205:247-262. doi: 10.1016/j.ymeth.2022.07.009. Epub 2022 Jul 22.

9

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.SCORCH：利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。

J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

10

BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening.BigBind：基于结构的虚拟筛选的非结构数据学习。

J Chem Inf Model. 2024 Apr 8;64(7):2488-2495. doi: 10.1021/acs.jcim.3c01211. Epub 2023 Dec 19.

引用本文的文献

1

Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.基于分子动力学模拟的时空学习用于蛋白质-配体结合亲和力预测。

Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.

2

Combining molecular design with semiempirical protein-ligand binding free energy calculation.将分子设计与半经验蛋白质-配体结合自由能计算相结合。

RSC Adv. 2024 Nov 20;14(50):37035-37044. doi: 10.1039/d4ra05422a. eCollection 2024 Nov 19.

3

A comprehensive review of artificial intelligence for pharmacology research.药理学研究中人工智能的全面综述。

Front Genet. 2024 Sep 3;15:1450529. doi: 10.3389/fgene.2024.1450529. eCollection 2024.

4

A Dataset of apical periodontitis lesions in panoramic radiographs for deep-learning-based classification and detection.用于基于深度学习的分类和检测的全景X线片中根尖周炎病变数据集。

Data Brief. 2024 May 5;54:110486. doi: 10.1016/j.dib.2024.110486. eCollection 2024 Jun.

5

Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning.利用基于电子密度的几何深度学习探索蛋白质-配体结合亲和力预测

RSC Adv. 2024 Feb 2;14(7):4492-4502. doi: 10.1039/d3ra08650j. eCollection 2024 Jan 31.

6

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks.基于深度神经网络的数据对结构结合亲和力预测的影响。

Int J Mol Sci. 2023 Nov 9;24(22):16120. doi: 10.3390/ijms242216120.

7

Integrated Molecular Modeling and Machine Learning for Drug Design.基于分子模拟的药物设计与机器学习的整合。

J Chem Theory Comput. 2023 Nov 14;19(21):7478-7495. doi: 10.1021/acs.jctc.3c00814. Epub 2023 Oct 26.

8

Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding.探索基于机器学习的虚拟筛选模型识别负责结合的官能团的能力。

J Cheminform. 2023 Sep 19;15(1):84. doi: 10.1186/s13321-023-00755-3.

9

A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening.迈向可泛化性的一小步：基于结构的虚拟筛选的机器学习打分函数的训练。

J Chem Inf Model. 2023 May 22;63(10):2960-2974. doi: 10.1021/acs.jcim.3c00322. Epub 2023 May 11.

10

Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.基于结构的深度学习预测蛋白质-配体结合亲和力的评分函数综述

Front Bioinform. 2022 Jun 17;2. doi: 10.3389/fbinf.2022.885983.

本文引用的文献

1

Elucidating the multiple roles of hydration for accurate protein-ligand binding prediction via deep learning.通过深度学习阐明水合作用在准确预测蛋白质-配体结合中的多重作用。

Commun Chem. 2020 Feb 11;3(1):19. doi: 10.1038/s42004-020-0261-x.

2

libmolgrid: Graphics Processing Unit Accelerated Molecular Gridding for Deep Learning Applications.Libmolgrid：用于深度学习应用的图形处理单元加速分子网格化

J Chem Inf Model. 2020 Mar 23;60(3):1079-1084. doi: 10.1021/acs.jcim.9b01145. Epub 2020 Feb 26.

3

Learning from the ligand: using ligand-based features to improve binding affinity prediction.从配体中学习：利用基于配体的特征来提高结合亲和力预测。

Bioinformatics. 2020 Feb 1;36(3):758-764. doi: 10.1093/bioinformatics/btz665.

4

Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.DUD-E 数据集的隐藏偏差导致基于结构的虚拟筛选中深度学习的性能产生误导。

PLoS One. 2019 Aug 20;14(8):e0220113. doi: 10.1371/journal.pone.0220113. eCollection 2019.

5

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences.DeepConv-DTI：基于蛋白质序列卷积的深度学习预测药物-靶标相互作用

PLoS Comput Biol. 2019 Jun 14;15(6):e1007129. doi: 10.1371/journal.pcbi.1007129. eCollection 2019 Jun.

6

Applications of machine learning in drug discovery and development.机器学习在药物发现和开发中的应用。

Nat Rev Drug Discov. 2019 Jun;18(6):463-477. doi: 10.1038/s41573-019-0024-5.

7

The Light and Dark Sides of Virtual Screening: What Is There to Know?虚拟筛选的光明面与黑暗面：有哪些需要了解的？

Int J Mol Sci. 2019 Mar 19;20(6):1375. doi: 10.3390/ijms20061375.

8

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.需要进行偏差控制：在基于结构的虚拟筛选中评估机器学习的化学数据。

J Chem Inf Model. 2019 Mar 25;59(3):947-961. doi: 10.1021/acs.jcim.8b00712. Epub 2019 Mar 5.

9

Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data.基于深度神经网络和迁移学习的蛋白质家族特异性模型提高虚拟筛选的性能，并凸显出对更多数据的需求。

J Chem Inf Model. 2018 Nov 26;58(11):2319-2330. doi: 10.1021/acs.jcim.8b00350. Epub 2018 Oct 16.

10

Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences.基于图和序列神经网络端到端学习的化合物-蛋白质相互作用预测。

Bioinformatics. 2019 Jan 15;35(2):309-318. doi: 10.1093/bioinformatics/bty535.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。