• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

WelQrate:定义小分子药物发现基准测试的黄金标准。

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.

作者信息

Liu Yunchao Lance, Dong Ha, Wang Xin, Moretti Rocco, Wang Yu, Su Zhaoqian, Gu Jiawei, Bodenheimer Bobby, Weaver Charles David, Meiler Jens, Derr Tyler

机构信息

Computer Science Dept., Vanderbilt University (VU).

Neural Science Dept., Amherst College.

出版信息

ArXiv. 2024 Nov 14:arXiv:2411.09820v1.

PMID:39606732
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11601797/
Abstract

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, . Specifically, our contributions are threefold: - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; - we evaluate model performance through various research questions using the dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed as the gold standard in small molecule drug discovery benchmarking. The dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.

摘要

虽然深度学习彻底改变了计算机辅助药物发现,但人工智能社区主要专注于模型创新,而较少强调建立最佳的基准测试实践。我们认为,如果没有一个完善的模型评估框架,人工智能社区的努力就无法充分发挥其潜力,从而减缓创新向实际药物发现的进展和转化。因此,在本文中,我们试图为小分子药物发现基准测试建立一个新的黄金标准。具体来说,我们的贡献有三个方面:- 我们引入了一个精心策划的包含9个数据集的集合,涵盖5个治疗靶点类别。我们由药物发现专家设计的分层策划流程,通过利用额外的验证性和反筛选以及严格的领域驱动预处理(如泛测定干扰化合物(PAINS)过滤),超越了初级高通量筛选,以确保数据集中的高质量数据;- 我们提出了一个标准化的模型评估框架,考虑了高质量数据集、特征化、3D构象生成、评估指标和数据分割,为进行实际虚拟筛选的药物发现专家提供了可靠的基准测试;- 我们使用数据集集合通过各种研究问题评估模型性能,探索不同模型、数据集质量、特征化方法和数据分割策略对结果的影响。总之,我们建议采用我们提出的方法作为小分子药物发现基准测试的黄金标准。数据集集合以及策划代码和实验脚本都可在WelQrate.org上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/0bbc13bb9f82/nihpp-2411.09820v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/61c2f657c2b5/nihpp-2411.09820v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/00a65bac75ab/nihpp-2411.09820v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/fc315570e0fb/nihpp-2411.09820v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/ebd6a5ca95b5/nihpp-2411.09820v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/87bfef9de0c7/nihpp-2411.09820v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/0bbc13bb9f82/nihpp-2411.09820v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/61c2f657c2b5/nihpp-2411.09820v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/00a65bac75ab/nihpp-2411.09820v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/fc315570e0fb/nihpp-2411.09820v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/ebd6a5ca95b5/nihpp-2411.09820v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/87bfef9de0c7/nihpp-2411.09820v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b88/11601797/0bbc13bb9f82/nihpp-2411.09820v1-f0006.jpg

相似文献

1
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking.WelQrate:定义小分子药物发现基准测试的黄金标准。
ArXiv. 2024 Nov 14:arXiv:2411.09820v1.
2
MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning.MF-PCBA:药物发现和机器学习的多保真度高通量筛选基准
J Chem Inf Model. 2023 May 8;63(9):2667-2678. doi: 10.1021/acs.jcim.2c01569. Epub 2023 Apr 14.
3
Combining crystallographic and binding affinity data towards a novel dataset of small molecule overlays.结合针对小分子叠加新数据集的晶体学和结合亲和力数据。
J Comput Aided Mol Des. 2024 Dec 4;39(1):2. doi: 10.1007/s10822-024-00581-1.
4
Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery.元分子网络:用于少量样本药物发现的跨域基准
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4849-4863. doi: 10.1109/TNNLS.2024.3359657.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
TEM virus images: Benchmark dataset and deep learning classification.TEM 病毒图像:基准数据集和深度学习分类。
Comput Methods Programs Biomed. 2021 Sep;209:106318. doi: 10.1016/j.cmpb.2021.106318. Epub 2021 Jul 29.
7
High-Throughput Screening Assay Datasets from the PubChem Database.来自PubChem数据库的高通量筛选分析数据集。
Chem Inform. 2017;3(1). Epub 2017 Apr 26.
8
Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset.大规模药物-靶点相互作用预测:Komet 算法与 LCIdb 数据集。
J Chem Inf Model. 2024 Sep 23;64(18):6938-6956. doi: 10.1021/acs.jcim.4c00422. Epub 2024 Sep 5.
9
A primer on applying AI synergistically with domain expertise to oncology.人工智能与肿瘤学领域专业知识协同应用基础指南。
Biochim Biophys Acta Rev Cancer. 2021 Aug;1876(1):188548. doi: 10.1016/j.bbcan.2021.188548. Epub 2021 Apr 24.
10
radMLBench: A dataset collection for benchmarking in radiomics.radMLBench:用于放射组学基准测试的数据集集合。
Comput Biol Med. 2024 Nov;182:109140. doi: 10.1016/j.compbiomed.2024.109140. Epub 2024 Sep 12.

引用本文的文献

1
A distributional reinforcement learning model for optimal glucose control after cardiac surgery.一种用于心脏手术后最佳血糖控制的分布式强化学习模型。
NPJ Digit Med. 2025 May 27;8(1):313. doi: 10.1038/s41746-025-01709-9.

本文引用的文献

1
Accelerating GPCR Drug Discovery With Conformation-Stabilizing VHHs.利用构象稳定的单域抗体加速G蛋白偶联受体药物研发
Front Mol Biosci. 2022 May 23;9:863099. doi: 10.3389/fmolb.2022.863099. eCollection 2022.
2
Introduction to the BioChemical Library (BCL): An Application-Based Open-Source Toolkit for Integrated Cheminformatics and Machine Learning in Computer-Aided Drug Discovery.生物化学库(BCL)简介:一种基于应用的开源工具包,用于计算机辅助药物发现中的综合化学信息学和机器学习。
Front Pharmacol. 2022 Feb 21;13:833099. doi: 10.3389/fphar.2022.833099. eCollection 2022.
3
PubChem 2019 update: improved access to chemical data.
PubChem 2019 年更新:改善化学数据获取。
Nucleic Acids Res. 2019 Jan 8;47(D1):D1102-D1109. doi: 10.1093/nar/gky1033.
4
SchNet - A deep learning architecture for molecules and materials.SchNet - 一种用于分子和材料的深度学习架构。
J Chem Phys. 2018 Jun 28;148(24):241722. doi: 10.1063/1.5019779.
5
High-Throughput Screening Assay Datasets from the PubChem Database.来自PubChem数据库的高通量筛选分析数据集。
Chem Inform. 2017;3(1). Epub 2017 Apr 26.
6
MoleculeNet: a benchmark for molecular machine learning.分子网络:分子机器学习的一个基准
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
7
Computational methods in drug discovery.药物发现中的计算方法。
Beilstein J Org Chem. 2016 Dec 12;12:2694-2718. doi: 10.3762/bjoc.12.267. eCollection 2016.
8
Recent Advances in Scaffold Hopping.骨架跃迁的最新进展
J Med Chem. 2017 Feb 23;60(4):1238-1246. doi: 10.1021/acs.jmedchem.6b01437. Epub 2016 Dec 21.
9
Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign.用于定量构效关系(QSAR)的自相关描述符改进:二维符号(2DA_Sign)和三维符号(3DA_Sign)。
J Comput Aided Mol Des. 2016 Mar;30(3):209-17. doi: 10.1007/s10822-015-9893-9. Epub 2015 Dec 31.
10
InChI, the IUPAC International Chemical Identifier.国际化学标识符(InChI),即国际纯粹与应用化学联合会的国际化学标识符。
J Cheminform. 2015 May 30;7:23. doi: 10.1186/s13321-015-0068-4. eCollection 2015.