• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TocoDecoy:一种设计无偏数据集的新方法,用于训练和基准测试机器学习评分函数。

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions.

机构信息

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.

State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, Zhejiang, China.

出版信息

J Med Chem. 2022 Jun 9;65(11):7918-7932. doi: 10.1021/acs.jmedchem.2c00460. Epub 2022 Jun 1.

DOI:10.1021/acs.jmedchem.2c00460
PMID:35642777
Abstract

Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named pology-based and nformation-based s generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.

摘要

开发基于机器学习的评分函数 (MLSFs) 以针对给定靶标进行基于结构的虚拟筛选,需要具有结构多样化的活性和虚拟化合物的大型无偏数据集。然而,大多数用于开发 MLSFs 的数据集是为传统 SFs 设计的,可能存在隐藏的偏差和数据不足。为此,我们开发了一种名为基于拓扑和信息的生成 (TocoDecoy) 的新方法,该方法通过调整特定靶标上的活性物质来生成虚拟化合物,以生成用于训练和基准测试 MLSFs 的无偏和可扩展数据集。为了评估隐藏偏差,我们评估了在 TocoDecoy、LIT-PCBA 和 DUD-E 样本文库上训练的 InteractionGraphNet (IGN) 的性能。结果表明,在 TocoDecoy 数据集上训练的 IGN 模型与在 LIT-PCBA 数据集上训练的模型具有竞争力,但明显优于在 DUD-E 数据集上训练的模型,表明 TocoDecoy 中的虚拟化合物对于训练和基准测试 MLSFs 是无偏的。

相似文献

1
TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions.TocoDecoy:一种设计无偏数据集的新方法,用于训练和基准测试机器学习评分函数。
J Med Chem. 2022 Jun 9;65(11):7918-7932. doi: 10.1021/acs.jmedchem.2c00460. Epub 2022 Jun 1.
2
Topology-Based and Conformation-Based Decoys Database: An Unbiased Online Database for Training and Benchmarking Machine-Learning Scoring Functions.基于拓扑结构和构象的诱饵数据库:一个用于培训和基准测试机器学习打分函数的无偏在线数据库。
J Med Chem. 2023 Jul 13;66(13):9174-9183. doi: 10.1021/acs.jmedchem.3c00801. Epub 2023 Jun 14.
3
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性:在虚拟筛选中,基于目标的机器学习打分函数能为我们带来什么?
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.
4
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening.LIT-PCBA:用于机器学习和虚拟筛选的无偏数据集。
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273. doi: 10.1021/acs.jcim.0c00155. Epub 2020 Apr 23.
5
ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions.ML-PLIC:一个用于描述蛋白质-配体相互作用和开发基于机器学习的打分函数的网络平台。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad295.
6
Beware of the generic machine learning-based scoring functions in structure-based virtual screening.在基于结构的虚拟筛选中,要警惕基于通用机器学习的打分函数。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa070.
7
MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery.MILCDock:用于药物发现虚拟筛选的机器学习增强共识对接。
J Chem Inf Model. 2022 Nov 28;62(22):5342-5350. doi: 10.1021/acs.jcim.2c00705. Epub 2022 Nov 7.
8
TB-IECS: an accurate machine learning-based scoring function for virtual screening.TB-IECS:一种用于虚拟筛选的基于机器学习的精确评分函数。
J Cheminform. 2023 Jul 4;15(1):63. doi: 10.1186/s13321-023-00731-x.
9
Data-augmented machine learning scoring functions for virtual screening of YTHDF1 mA reader protein.基于数据增强的机器学习打分函数在 YTHDF1 mA 读蛋白虚拟筛选中的应用。
Comput Biol Med. 2024 Dec;183:109268. doi: 10.1016/j.compbiomed.2024.109268. Epub 2024 Oct 12.
10
Improving structure-based virtual screening performance via learning from scoring function components.通过从打分函数组件中学习来提高基于结构的虚拟筛选性能。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa094.

引用本文的文献

1
SurfDock is a surface-informed diffusion generative model for reliable and accurate protein-ligand complex prediction.SurfDock是一种基于表面信息的扩散生成模型,用于可靠且准确地预测蛋白质-配体复合物。
Nat Methods. 2025 Feb;22(2):310-322. doi: 10.1038/s41592-024-02516-y. Epub 2024 Nov 27.
2
Integrated Molecular Modeling and Machine Learning for Drug Design.基于分子模拟的药物设计与机器学习的整合。
J Chem Theory Comput. 2023 Nov 14;19(21):7478-7495. doi: 10.1021/acs.jctc.3c00814. Epub 2023 Oct 26.
3
Open-Source Machine Learning in Computational Chemistry.
开源机器学习在计算化学中的应用。
J Chem Inf Model. 2023 Aug 14;63(15):4505-4532. doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.
4
TB-IECS: an accurate machine learning-based scoring function for virtual screening.TB-IECS:一种用于虚拟筛选的基于机器学习的精确评分函数。
J Cheminform. 2023 Jul 4;15(1):63. doi: 10.1186/s13321-023-00731-x.
5
Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening.高通量虚拟筛选共识对接综合调查。
Molecules. 2022 Dec 25;28(1):175. doi: 10.3390/molecules28010175.