• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

发展新方法需要适当的评估——用于 A 类 GPCR 机器学习实验的基准数据集。

Development of New Methods Needs Proper Evaluation-Benchmarking Sets for Machine Learning Experiments for Class A GPCRs.

机构信息

Faculty of Mathematics and Computer Science , Jagiellonian University , 6 Łojasiewicza Street , 30-348 Kraków , Poland.

Department of Technology and Biotechnology of Drugs , Jagiellonian University Medical College , 9 Medyczna Street , 30-688 Kraków , Poland.

出版信息

J Chem Inf Model. 2019 Dec 23;59(12):4974-4992. doi: 10.1021/acs.jcim.9b00689. Epub 2019 Nov 22.

DOI:10.1021/acs.jcim.9b00689
PMID:31604014
Abstract

New computational approaches for virtual screening applications are constantly being developed. However, before a particular tool is used to search for new active compounds, its effectiveness in the type of task must be examined. In this study, we conducted a detailed analysis of various aspects of preparation of respective data sets for such an evaluation. We propose a protocol for fetching data from the ChEMBL database, examine various compound representations in terms of the possible bias resulting from the way they are generated, and define a new metric for comparing the structural similarity of compounds, which is in line with chemical intuition. The newly developed method is also used for the evaluation of various approaches for division of the data set into training and test set parts, which are also examined in detail in terms of being the source of possible results bias. Finally, machine learning methods are applied in cross-validation studies of data sets constructed within the paper, constituting benchmarks for the assessment of computational methods developed for virtual screening tasks. Additionally, analogous data sets for class A G protein-coupled receptors (100 targets with the highest number of records) were prepared. They are available at http://gmum.net/benchmarks/ , together with script enabling reproduction of all results available at https://github.com/lesniak43/ananas .

摘要

新的计算方法不断被开发用于虚拟筛选应用。然而,在使用特定工具搜索新的活性化合物之前,必须检查其在特定任务类型中的有效性。在这项研究中,我们详细分析了为这种评估准备相应数据集的各个方面。我们提出了一种从 ChEMBL 数据库获取数据的方案,检查了各种化合物表示形式,以评估它们生成方式可能导致的偏差,并定义了一种新的化合物结构相似性比较度量标准,该标准符合化学直觉。新开发的方法还用于评估数据集分为训练集和测试集部分的各种方法,也详细检查了它们作为可能结果偏差来源的情况。最后,在本文构建的数据集的交叉验证研究中应用了机器学习方法,为用于虚拟筛选任务的计算方法的评估构成了基准。此外,还准备了类似的 A 类 G 蛋白偶联受体数据集(100 个具有最高记录数的靶标)。它们可在 http://gmum.net/benchmarks/ 上获得,并且可以使用脚本重现 https://github.com/lesniak43/ananas 上提供的所有结果。

相似文献

1
Development of New Methods Needs Proper Evaluation-Benchmarking Sets for Machine Learning Experiments for Class A GPCRs.发展新方法需要适当的评估——用于 A 类 GPCR 机器学习实验的基准数据集。
J Chem Inf Model. 2019 Dec 23;59(12):4974-4992. doi: 10.1021/acs.jcim.9b00689. Epub 2019 Nov 22.
2
An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.一种用于基于配体的虚拟筛选的无偏基准集构建方法及其在 GPCR 中的应用。
J Chem Inf Model. 2014 May 27;54(5):1433-50. doi: 10.1021/ci500062f. Epub 2014 May 1.
3
LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening.LIT-PCBA:用于机器学习和虚拟筛选的无偏数据集。
J Chem Inf Model. 2020 Sep 28;60(9):4263-4273. doi: 10.1021/acs.jcim.0c00155. Epub 2020 Apr 23.
4
Quo vadis G protein-coupled receptor ligands? A tool for analysis of the emergence of new groups of compounds over time.G蛋白偶联受体配体何去何从?一种分析新化合物组随时间出现情况的工具。
Bioorg Med Chem Lett. 2017 Feb 1;27(3):626-631. doi: 10.1016/j.bmcl.2016.12.001. Epub 2016 Dec 2.
5
Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.用于虚拟配体筛选方法评估的基准数据集:综述与展望。
J Chem Inf Model. 2015 Jul 27;55(7):1297-307. doi: 10.1021/acs.jcim.5b00090. Epub 2015 Jun 18.
6
Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learning.基于相互作用能学习的结构为基础的虚拟筛选改进方法。
J Chem Inf Model. 2019 Mar 25;59(3):1050-1061. doi: 10.1021/acs.jcim.8b00673. Epub 2019 Mar 18.
7
GPCR-Bench: A Benchmarking Set and Practitioners' Guide for G Protein-Coupled Receptor Docking.GPCR-Bench:G蛋白偶联受体对接的基准测试集及从业者指南
J Chem Inf Model. 2016 Apr 25;56(4):642-51. doi: 10.1021/acs.jcim.5b00660. Epub 2016 Mar 24.
8
Practical Model Selection for Prospective Virtual Screening.前瞻性虚拟筛选的实用模型选择。
J Chem Inf Model. 2019 Jan 28;59(1):282-293. doi: 10.1021/acs.jcim.8b00363. Epub 2018 Dec 18.
9
Big Data Challenges Targeting Proteins in GPCR Signaling Pathways; Combining PTML-ChEMBL Models and [S]GTPγS Binding Assays.针对 G 蛋白偶联受体信号通路中蛋白质的大数据挑战;结合 PTML-ChEMBL 模型和 [S]GTPγS 结合测定法。
ACS Chem Neurosci. 2019 Nov 20;10(11):4476-4491. doi: 10.1021/acschemneuro.9b00302. Epub 2019 Nov 4.
10
REPROVIS-DB: a benchmark system for ligand-based virtual screening derived from reproducible prospective applications.REPROVIS-DB:一个基于配体的虚拟筛选基准系统,源自可重现的前瞻性应用。
J Chem Inf Model. 2011 Oct 24;51(10):2467-73. doi: 10.1021/ci200309j. Epub 2011 Sep 26.

引用本文的文献

1
Mutual Support of Ligand- and Structure-Based Approaches-To What Extent We Can Optimize the Power of Predictive Model? Case Study of Opioid Receptors.配体和基于结构的方法相互支持——在多大程度上我们可以优化预测模型的能力?以阿片受体为例。
Molecules. 2021 Mar 14;26(6):1607. doi: 10.3390/molecules26061607.
2
How Sure Can We Be about ML Methods-Based Evaluation of Compound Activity: Incorporation of Information about Prediction Uncertainty Using Deep Learning Techniques.基于机器学习方法的化合物活性评估有多大把握:利用深度学习技术纳入预测不确定性信息。
Molecules. 2020 Mar 23;25(6):1452. doi: 10.3390/molecules25061452.