• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

需要进行偏差控制:在基于结构的虚拟筛选中评估机器学习的化学数据。

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.

机构信息

Universität Hamburg , ZBH - Center for Bioinformatics, Research Group for Computational Molecular Design , Bundesstraße 43 , 20146 Hamburg , Germany.

出版信息

J Chem Inf Model. 2019 Mar 25;59(3):947-961. doi: 10.1021/acs.jcim.8b00712. Epub 2019 Mar 5.

DOI:10.1021/acs.jcim.8b00712
PMID:30835112
Abstract

Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated.

摘要

越来越多的报告成功应用机器学习 (ML) 方法于基于结构的虚拟筛选 (SBVS)。例如卷积神经网络等 ML 方法显示出很有前景的结果,并且在回顾性验证中通常优于经验评分函数等传统方法。然而,训练有素的 ML 模型通常被视为黑盒,并且不容易解释。在大多数情况下,尚不清楚数据中的哪些特征是决定性的,以及模型的预测是否是正确的原因。因此,我们重新评估了 ML 方法背景下的三个广泛使用的基准数据集,得出的结论是并非每个基准数据集都适用。此外,我们通过来自当前文献的两个示例证明,偏差是从标准基准中隐含地、不知不觉地学习到的。基于这些结果,我们得出的结论是,需要进行合格的验证实验和适合 ML 的基准数据集,以便在基于 ML 的 SBVS 中进行更受控制的偏差验证。因此,我们提供了设置验证实验的指南,并就如何生成新数据集提供了一些观点。

相似文献

1
In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.需要进行偏差控制:在基于结构的虚拟筛选中评估机器学习的化学数据。
J Chem Inf Model. 2019 Mar 25;59(3):947-961. doi: 10.1021/acs.jcim.8b00712. Epub 2019 Mar 5.
2
Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。
J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.
3
Topology-Based and Conformation-Based Decoys Database: An Unbiased Online Database for Training and Benchmarking Machine-Learning Scoring Functions.基于拓扑结构和构象的诱饵数据库:一个用于培训和基准测试机器学习打分函数的无偏在线数据库。
J Med Chem. 2023 Jul 13;66(13):9174-9183. doi: 10.1021/acs.jmedchem.3c00801. Epub 2023 Jun 14.
4
Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.基于与测试集不相似的复合物进行训练的机器学习评分函数,在盲基准测试中已经优于经典对应物。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab225.
5
Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data.基于深度神经网络和迁移学习的蛋白质家族特异性模型提高虚拟筛选的性能,并凸显出对更多数据的需求。
J Chem Inf Model. 2018 Nov 26;58(11):2319-2330. doi: 10.1021/acs.jcim.8b00350. Epub 2018 Oct 16.
6
Toward a benchmarking data set able to evaluate ligand- and structure-based virtual screening using public HTS data.构建一个基准数据集,用于利用公开的高通量筛选数据评估基于配体和结构的虚拟筛选。
J Chem Inf Model. 2015 Feb 23;55(2):343-53. doi: 10.1021/ci5005465. Epub 2015 Jan 28.
7
Selecting machine-learning scoring functions for structure-based virtual screening.基于结构的虚拟筛选中机器学习打分函数的选择。
Drug Discov Today Technol. 2019 Dec;32-33:81-87. doi: 10.1016/j.ddtec.2020.09.001. Epub 2020 Sep 19.
8
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening.LIT-PCBA:用于机器学习和虚拟筛选的无偏数据集。
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273. doi: 10.1021/acs.jcim.0c00155. Epub 2020 Apr 23.
9
A practical guide to machine-learning scoring for structure-based virtual screening.基于结构的虚拟筛选的机器学习评分实用指南。
Nat Protoc. 2023 Nov;18(11):3460-3511. doi: 10.1038/s41596-023-00885-w. Epub 2023 Oct 16.
10
Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning.基于结构的虚拟筛选的改进与集成对接和机器学习。
J Chem Inf Model. 2021 Nov 22;61(11):5362-5376. doi: 10.1021/acs.jcim.1c00511. Epub 2021 Oct 15.

引用本文的文献

1
Spatio-temporal learning from molecular dynamics simulations for protein-ligand binding affinity prediction.基于分子动力学模拟的时空学习用于蛋白质-配体结合亲和力预测。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf429.
2
Evidential deep learning-based drug-target interaction prediction.基于证据深度学习的药物-靶点相互作用预测
Nat Commun. 2025 Jul 26;16(1):6915. doi: 10.1038/s41467-025-62235-6.
3
ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance.ColdstartCPI:基于诱导契合理论指导的具有改进泛化性能的DTI预测模型。
Nat Commun. 2025 Jul 11;16(1):6436. doi: 10.1038/s41467-025-61745-7.
4
StructureNet: Physics-Informed Hybridized Deep Learning Framework for Protein-Ligand Binding Affinity Prediction.结构网络:用于蛋白质-配体结合亲和力预测的物理信息混合深度学习框架
Bioengineering (Basel). 2025 May 10;12(5):505. doi: 10.3390/bioengineering12050505.
5
WDGBANDTI: A Deep Graph Convolutional Network-Based Bilinear Attention Network for Drug-Target Interaction Prediction with Domain Adaptation.WDGBANDTI:一种基于深度图卷积网络的双线性注意力网络,用于具有域适应的药物-靶点相互作用预测。
Interdiscip Sci. 2025 May 23. doi: 10.1007/s12539-025-00714-6.
6
A beginner's approach to deep learning applied to VS and MD techniques.深度学习应用于VS和MD技术的初学者方法。
J Cheminform. 2025 Apr 8;17(1):47. doi: 10.1186/s13321-025-00985-7.
7
iScore: A ML-Based Scoring Function for De Novo Drug Discovery.iScore:一种用于从头药物发现的基于机器学习的评分函数。
J Chem Inf Model. 2025 Mar 24;65(6):2759-2772. doi: 10.1021/acs.jcim.4c02192. Epub 2025 Mar 4.
8
GNINA 1.3: the next increment in molecular docking with deep learning.GNINA 1.3:深度学习在分子对接方面的下一次进展。
J Cheminform. 2025 Mar 2;17(1):28. doi: 10.1186/s13321-025-00973-x.
9
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于蛋白质-配体相互作用研究的自然语言处理方法
J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.
10
Robustly interrogating machine learning-based scoring functions: what are they learning?深入探究基于机器学习的评分函数:它们在学习什么?
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.