准确性还是新颖性：在虚拟筛选中，基于目标的机器学习打分函数能为我们带来什么？

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?

机构信息

Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.

State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.

DOI:10.1093/bib/bbaa410

PMID:33418562

Abstract

Machine-learning (ML)-based scoring functions (MLSFs) have gradually emerged as a promising alternative for protein-ligand binding affinity prediction and structure-based virtual screening. However, clouds of doubts have still been raised against the benefits of this novel type of scoring functions (SFs). In this study, to benchmark the performance of target-specific MLSFs on a relatively unbiased dataset, the MLSFs trained from three representative protein-ligand interaction representations were assessed on the LIT-PCBA dataset, and the classical Glide SP SF and three types of ligand-based quantitative structure-activity relationship (QSAR) models were also utilized for comparison. Two major aspects in virtual screening campaigns, including prediction accuracy and hit novelty, were systematically explored. The calculation results illustrate that the tested target-specific MLSFs yielded generally superior performance over the classical Glide SP SF, but they could hardly outperform the 2D fingerprint-based QSAR models. Although substantial improvements could be achieved by integrating multiple types of protein-ligand interaction features, the MLSFs were still not sufficient to exceed MACCS-based QSAR models. In terms of the correlations between the hit ranks or the structures of the top-ranked hits, the MLSFs developed by different featurization strategies would have the ability to identify quite different hits. Nevertheless, it seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs. In contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models. It is expected that our study may provide valuable guidance for the assessment and further development of target-specific MLSFs.

摘要

基于机器学习 (ML) 的打分函数 (MLSFs) 逐渐成为预测蛋白质-配体结合亲和力和基于结构的虚拟筛选的有前途的替代方法。然而，对于这种新型打分函数 (SFs) 的益处，仍然存在许多疑虑。在这项研究中，为了在相对无偏数据集上基准化针对特定目标的 MLSFs 的性能，评估了基于三种代表性蛋白质-配体相互作用表示的 MLSFs 在 LIT-PCBA 数据集上的性能，同时还利用了经典的 Glide SP SF 和三种类型的基于配体的定量构效关系 (QSAR) 模型进行比较。系统地探讨了虚拟筛选活动中的两个主要方面，包括预测准确性和命中新颖性。计算结果表明，测试的针对特定目标的 MLSFs 通常优于经典的 Glide SP SF，但很难超过基于 2D 指纹的 QSAR 模型。虽然通过整合多种类型的蛋白质-配体相互作用特征可以取得实质性的改进，但 MLSFs 仍然不足以超过基于 MACCS 的 QSAR 模型。就命中排名或排名靠前的命中结构之间的相关性而言，采用不同特征化策略开发的 MLSFs 有能力识别出截然不同的命中。然而，针对特定目标的 MLSFs 似乎没有传统 SF 的内在属性，并且可能不能替代经典 SF。相比之下，MLSFs 可以被视为基于配体的 QSAR 模型的一种新衍生。预计我们的研究可为针对特定目标的 MLSFs 的评估和进一步发展提供有价值的指导。

相似文献

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性：在虚拟筛选中，基于目标的机器学习打分函数能为我们带来什么？

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.

Beware of the generic machine learning-based scoring functions in structure-based virtual screening.在基于结构的虚拟筛选中，要警惕基于通用机器学习的打分函数。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa070.

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions.TocoDecoy：一种设计无偏数据集的新方法，用于训练和基准测试机器学习评分函数。

J Med Chem. 2022 Jun 9;65(11):7918-7932. doi: 10.1021/acs.jmedchem.2c00460. Epub 2022 Jun 1.

TB-IECS: an accurate machine learning-based scoring function for virtual screening.TB-IECS：一种用于虚拟筛选的基于机器学习的精确评分函数。

J Cheminform. 2023 Jul 4;15(1):63. doi: 10.1186/s13321-023-00731-x.

Topology-Based and Conformation-Based Decoys Database: An Unbiased Online Database for Training and Benchmarking Machine-Learning Scoring Functions.基于拓扑结构和构象的诱饵数据库：一个用于培训和基准测试机器学习打分函数的无偏在线数据库。

J Med Chem. 2023 Jul 13;66(13):9174-9183. doi: 10.1021/acs.jmedchem.3c00801. Epub 2023 Jun 14.

Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。

J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.

Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment.用于预测配体结合构象和亲和力以及进行筛选富集的任务特定评分函数。

J Chem Inf Model. 2018 Jan 22;58(1):119-133. doi: 10.1021/acs.jcim.7b00309. Epub 2017 Dec 20.

Improving structure-based virtual screening performance via learning from scoring function components.通过从打分函数组件中学习来提高基于结构的虚拟筛选性能。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa094.

Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening.基于结构的虚拟筛选中机器学习打分函数泛化能力的评估。

J Chem Inf Model. 2022 Nov 28;62(22):5485-5502. doi: 10.1021/acs.jcim.2c01149. Epub 2022 Oct 21.

ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions.ML-PLIC：一个用于描述蛋白质-配体相互作用和开发基于机器学习的打分函数的网络平台。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad295.

引用本文的文献

Reducing overconfident errors in molecular property classification using Posterior Network.使用后验网络减少分子性质分类中的过度自信错误。

Patterns (N Y). 2024 May 8;5(6):100991. doi: 10.1016/j.patter.2024.100991. eCollection 2024 Jun 14.

Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors.综合机器学习助力基于结构的PARP1抑制剂虚拟筛选。

J Cheminform. 2024 Apr 7;16(1):40. doi: 10.1186/s13321-024-00832-1.

A practical guide to machine-learning scoring for structure-based virtual screening.基于结构的虚拟筛选的机器学习评分实用指南。

Nat Protoc. 2023 Nov;18(11):3460-3511. doi: 10.1038/s41596-023-00885-w. Epub 2023 Oct 16.

A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers.一个具有平衡评分、对接、排序和筛选能力的通用蛋白质-配体评分框架。

Chem Sci. 2023 Jul 4;14(30):8129-8146. doi: 10.1039/d3sc02044d. eCollection 2023 Aug 2.

TB-IECS: an accurate machine learning-based scoring function for virtual screening.TB-IECS：一种用于虚拟筛选的基于机器学习的精确评分函数。

J Cheminform. 2023 Jul 4;15(1):63. doi: 10.1186/s13321-023-00731-x.

Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons.警惕基于结构的虚拟筛选的简单方法：更广泛比较的至关重要性。

J Chem Inf Model. 2023 Mar 13;63(5):1401-1405. doi: 10.1021/acs.jcim.3c00218. Epub 2023 Feb 27.

Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening.高通量虚拟筛选共识对接综合调查。

Molecules. 2022 Dec 25;28(1):175. doi: 10.3390/molecules28010175.

Protein-Ligand Docking in the Machine-Learning Era.蛋白质-配体对接在机器学习时代。

Molecules. 2022 Jul 18;27(14):4568. doi: 10.3390/molecules27144568.

Structure-based virtual screening for PDL1 dimerizers: Evaluating generic scoring functions.基于结构的程序性死亡受体配体1（PDL1）二聚体化剂虚拟筛选：评估通用评分函数

Curr Res Struct Biol. 2022 Jun 9;4:206-210. doi: 10.1016/j.crstbi.2022.06.002. eCollection 2022.

Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions.利用 Delta 机器学习改进蛋白质配体打分函数的评分-排名-筛选性能。

J Chem Inf Model. 2022 Jun 13;62(11):2696-2712. doi: 10.1021/acs.jcim.2c00485. Epub 2022 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

准确性还是新颖性：在虚拟筛选中，基于目标的机器学习打分函数能为我们带来什么？

Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献