• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于结构的虚拟筛选中机器学习打分函数泛化能力的评估。

Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening.

机构信息

Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China102206, China.

National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing102206, China.

出版信息

J Chem Inf Model. 2022 Nov 28;62(22):5485-5502. doi: 10.1021/acs.jcim.2c01149. Epub 2022 Oct 21.

DOI:10.1021/acs.jcim.2c01149
PMID:36268980
Abstract

In structure-based virtual screening (SBVS), it is critical that scoring functions capture protein-ligand atomic interactions. By focusing on the local domains of ligand binding pockets, a standardized pocket Pfam-based clustering (Pfam-cluster) approach was developed to assess the cross-target generalization ability of machine-learning scoring functions (MLSFs). Subsequently, 12 typical MLSFs were evaluated using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performances from Random-CV to Seq-CV to Pfam-CV experiments, not showing satisfactory generalization capacity. Our interpretable analysis suggested that the predictions on novel targets by MLSFs were dependent on buried solvent-accessible surface area (SASA)-related features of complex structures, with greater predicted binding affinities on complexes owning larger protein-ligand interfaces. By combining buried SASA-related features with target-specific patterns that were only shared among structurally similar compounds in the same cluster, the random forest (RF)-Score attained a good performance in the Random-CV test. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.

摘要

在基于结构的虚拟筛选 (SBVS) 中,评分函数捕捉蛋白质-配体原子相互作用至关重要。通过关注配体结合口袋的局部域,开发了一种标准化口袋 Pfam 聚类 (Pfam-cluster) 方法,以评估基于机器学习评分函数 (MLSFs) 的跨靶泛化能力。随后,使用随机交叉验证 (Random-CV)、基于蛋白质序列相似性的交叉验证 (Seq-CV) 和口袋 Pfam 交叉验证 (Pfam-CV) 方法评估了 12 种典型的 MLSFs。令人惊讶的是,所有测试模型都显示出从 Random-CV 到 Seq-CV 再到 Pfam-CV 实验的性能下降,并没有表现出令人满意的泛化能力。我们的可解释性分析表明,MLSFs 对新靶标的预测取决于复合物结构中埋藏溶剂可及表面积 (SASA) 相关特征,具有更大蛋白质-配体界面的复合物具有更大的预测结合亲和力。通过将埋藏 SASA 相关特征与仅在同一簇中结构相似化合物之间共享的目标特定模式相结合,随机森林 (RF)-Score 在 Random-CV 测试中取得了良好的性能。基于这些发现,我们强烈建议使用 Pfam-cluster 方法评估 MLSFs 的泛化能力,并谨慎对待 MLSFs 学习到的特征。

相似文献

1
Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening.基于结构的虚拟筛选中机器学习打分函数泛化能力的评估。
J Chem Inf Model. 2022 Nov 28;62(22):5485-5502. doi: 10.1021/acs.jcim.2c01149. Epub 2022 Oct 21.
2
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性:在虚拟筛选中,基于目标的机器学习打分函数能为我们带来什么?
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.
3
Beware of the generic machine learning-based scoring functions in structure-based virtual screening.在基于结构的虚拟筛选中,要警惕基于通用机器学习的打分函数。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa070.
4
Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。
Comput Biol Med. 2024 Dec;183:109268. doi: 10.1016/j.compbiomed.2024.109268. Epub 2024 Oct 12.
5
A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein-Protein Interfaces.一种混合对接和机器学习方法,可增强在蛋白质-蛋白质界面上进行的虚拟筛选的性能。
Int J Mol Sci. 2022 Nov 18;23(22):14364. doi: 10.3390/ijms232214364.
6
Topology-Based and Conformation-Based Decoys Database: An Unbiased Online Database for Training and Benchmarking Machine-Learning Scoring Functions.基于拓扑结构和构象的诱饵数据库:一个用于培训和基准测试机器学习打分函数的无偏在线数据库。
J Med Chem. 2023 Jul 13;66(13):9174-9183. doi: 10.1021/acs.jmedchem.3c00801. Epub 2023 Jun 14.
7
Machine learning in computational docking.计算对接中的机器学习。
Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.
8
TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions.TocoDecoy:一种设计无偏数据集的新方法,用于训练和基准测试机器学习评分函数。
J Med Chem. 2022 Jun 9;65(11):7918-7932. doi: 10.1021/acs.jmedchem.2c00460. Epub 2022 Jun 1.
9
PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments.PharmRF:一种机器学习评分函数,用于识别具有高富集度的基于结构的药效团筛选的最佳蛋白质-配体复合物。
J Comput Chem. 2022 May 5;43(12):847-863. doi: 10.1002/jcc.26840. Epub 2022 Mar 18.
10
ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions.ML-PLIC:一个用于描述蛋白质-配体相互作用和开发基于机器学习的打分函数的网络平台。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad295.

引用本文的文献

1
Further exploration of the quantitative distance-energy and contact number-energy relationships for predicting the binding affinity of protein-ligand complexes.进一步探索用于预测蛋白质-配体复合物结合亲和力的定量距离-能量和接触数-能量关系。
Biophys J. 2025 Apr 1;124(7):1166-1177. doi: 10.1016/j.bpj.2025.02.021. Epub 2025 Feb 27.
2
Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data.利用增强数据缩小机器学习评分函数与自由能微扰之间的差距。
Commun Chem. 2025 Feb 8;8(1):41. doi: 10.1038/s42004-025-01428-y.
3
Robustly interrogating machine learning-based scoring functions: what are they learning?
深入探究基于机器学习的评分函数:它们在学习什么?
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.
4
From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning.从静态结构到动态结构:基于图的深度学习提高结合亲和力预测。
Adv Sci (Weinh). 2024 Oct;11(40):e2405404. doi: 10.1002/advs.202405404. Epub 2024 Aug 29.
5
Machine learning accelerates pharmacophore-based virtual screening of MAO inhibitors.机器学习加速基于药效团的单胺氧化酶抑制剂虚拟筛选。
Sci Rep. 2024 Apr 8;14(1):8228. doi: 10.1038/s41598-024-58122-7.
6
A flexible data-free framework for structure-based drug design with reinforcement learning.一种用于基于结构的药物设计的灵活的无数据强化学习框架。
Chem Sci. 2023 Oct 19;14(43):12166-12181. doi: 10.1039/d3sc04091g. eCollection 2023 Nov 8.
7
A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening.迈向可泛化性的一小步:基于结构的虚拟筛选的机器学习打分函数的训练。
J Chem Inf Model. 2023 May 22;63(10):2960-2974. doi: 10.1021/acs.jcim.3c00322. Epub 2023 May 11.