• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大多数基于配体的分类基准更奖励记忆而不是泛化。

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.

机构信息

Atomwise Inc. , 221 Main Street, Suite 1350 , San Francisco , California 94105 , United States.

出版信息

J Chem Inf Model. 2018 May 29;58(5):916-932. doi: 10.1021/acs.jcim.7b00403. Epub 2018 May 8.

DOI:10.1021/acs.jcim.7b00403
PMID:29698607
Abstract

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems, that accounts for the similarity among inactive molecules as well as active ones. We investigated seven widely used benchmarks for virtual screening and classification, and we show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously applied unbiasing techniques. Therefore, it may be the case that the previously reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

摘要

当训练数据和验证数据之间存在显著冗余时,可能会出现未被发现的过拟合。我们描述了AVE,这是一种用于基于配体的分类问题的新的训练-验证冗余度量方法,它考虑了无活性分子和活性分子之间的相似性。我们研究了七个广泛用于虚拟筛选和分类的基准测试,结果表明,AVE 偏差的数量与基于配体的预测方法的性能密切相关,无论预测的性质、化学指纹、相似性度量还是先前应用的去偏技术如何。因此,可能的情况是,以前报道的大多数基于配体的方法的性能可以通过过度拟合基准来解释,而不是良好的前瞻性准确性。

相似文献

1
Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization.大多数基于配体的分类基准更奖励记忆而不是泛化。
J Chem Inf Model. 2018 May 29;58(5):916-932. doi: 10.1021/acs.jcim.7b00403. Epub 2018 May 8.
2
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening.LIT-PCBA:用于机器学习和虚拟筛选的无偏数据集。
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273. doi: 10.1021/acs.jcim.0c00155. Epub 2020 Apr 23.
3
Machine learning and ligand binding predictions: A review of data, methods, and obstacles.机器学习和配体结合预测:数据、方法和障碍的综述。
Biochim Biophys Acta Gen Subj. 2020 Jun;1864(6):129545. doi: 10.1016/j.bbagen.2020.129545. Epub 2020 Feb 10.
4
Molecular interaction fingerprint approaches for GPCR drug discovery.用于G蛋白偶联受体(GPCR)药物发现的分子相互作用指纹方法。
Curr Opin Pharmacol. 2016 Oct;30:59-68. doi: 10.1016/j.coph.2016.07.007. Epub 2016 Jul 29.
5
Benchmarking the Predictive Power of Ligand Efficiency Indices in QSAR.定量构效关系中配体效率指数预测能力的基准测试
J Chem Inf Model. 2016 Aug 22;56(8):1576-87. doi: 10.1021/acs.jcim.6b00136. Epub 2016 Jul 19.
6
Use of machine learning approaches for novel drug discovery.机器学习方法在新型药物发现中的应用。
Expert Opin Drug Discov. 2016;11(3):225-39. doi: 10.1517/17460441.2016.1146250.
7
Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.虚拟筛选中配体富集评估的基准测试方法和数据集
Methods. 2015 Jan;71:146-57. doi: 10.1016/j.ymeth.2014.11.015. Epub 2014 Dec 3.
8
A Ligand-Based Virtual Screening Method Using Direct Quantification of Generalization Ability.基于配体的虚拟筛选方法,使用泛化能力的直接量化。
Molecules. 2019 Jun 30;24(13):2414. doi: 10.3390/molecules24132414.
9
A big data approach with artificial neural network and molecular similarity for chemical data mining and endocrine disruption prediction.一种结合人工神经网络和分子相似性的大数据方法用于化学数据挖掘和内分泌干扰预测。
Indian J Pharmacol. 2018 Jul-Aug;50(4):169-176. doi: 10.4103/ijp.IJP_304_17.
10
Development of New Methods Needs Proper Evaluation-Benchmarking Sets for Machine Learning Experiments for Class A GPCRs.发展新方法需要适当的评估——用于 A 类 GPCR 机器学习实验的基准数据集。
J Chem Inf Model. 2019 Dec 23;59(12):4974-4992. doi: 10.1021/acs.jcim.9b00689. Epub 2019 Nov 22.

引用本文的文献

1
Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT.利用SPRINT将结构感知虚拟筛选扩展到数十亿个分子。
ArXiv. 2025 Jan 20:arXiv:2411.15418v2.
2
Efficient and Explainable Virtual Screening of Molecules through Fingerprint-Generating Networks Integrated with Artificial Neural Networks.通过与人工神经网络集成的指纹生成网络对分子进行高效且可解释的虚拟筛选。
ACS Omega. 2025 Jan 28;10(5):4896-4911. doi: 10.1021/acsomega.4c10289. eCollection 2025 Feb 11.
3
Robustly interrogating machine learning-based scoring functions: what are they learning?
深入探究基于机器学习的评分函数:它们在学习什么?
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.
4
HDBind: encoding of molecular structure with hyperdimensional binary representations.HDBind:采用超维二进制表示法对分子结构进行编码。
Sci Rep. 2024 Nov 23;14(1):29025. doi: 10.1038/s41598-024-80009-w.
5
PharmacoNet: deep learning-guided pharmacophore modeling for ultra-large-scale virtual screening.PharmacoNet:用于超大分子虚拟筛选的深度学习引导的药效团建模
Chem Sci. 2024 Nov 4;15(46):19473-19487. doi: 10.1039/d4sc04854g. eCollection 2024 Nov 27.
6
VDAC1-interacting molecules promote cell death in cancer organoids through mitochondrial-dependent metabolic interference.与VDAC1相互作用的分子通过线粒体依赖性代谢干扰促进癌症类器官中的细胞死亡。
iScience. 2024 Apr 30;27(6):109853. doi: 10.1016/j.isci.2024.109853. eCollection 2024 Jun 21.
7
Inference of drug off-target effects on cellular signaling using interactome-based deep learning.使用基于相互作用组的深度学习推断药物对细胞信号传导的脱靶效应。
iScience. 2024 Mar 14;27(4):109509. doi: 10.1016/j.isci.2024.109509. eCollection 2024 Apr 19.
8
AI is a viable alternative to high throughput screening: a 318-target study.人工智能是高通量筛选的可行替代方案:一项 318 靶点研究。
Sci Rep. 2024 Apr 2;14(1):7526. doi: 10.1038/s41598-024-54655-z.
9
A practical guide to machine-learning scoring for structure-based virtual screening.基于结构的虚拟筛选的机器学习评分实用指南。
Nat Protoc. 2023 Nov;18(11):3460-3511. doi: 10.1038/s41596-023-00885-w. Epub 2023 Oct 16.
10
Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors.当前用于预测激酶抑制剂结合亲和力的深度学习模型泛化能力较差。
bioRxiv. 2023 Sep 6:2023.09.04.556234. doi: 10.1101/2023.09.04.556234.