• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定量评估生物活性化合物公共数据库和商业数据库之间不断扩大的互补性。

Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds.

机构信息

DECS Global Compound Sciences, Computational Chemistry, AstraZeneca R&D Mölndal, S-43183 Mölndal, Sweden.

出版信息

J Cheminform. 2009 Jul 6;1(1):10. doi: 10.1186/1758-2946-1-10.

DOI:10.1186/1758-2946-1-10
PMID:20298516
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3225862/
Abstract

BACKGROUND

Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets.

RESULTS

Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not.

CONCLUSION

On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.

摘要

背景

自 2004 年以来,公共化学信息数据库及其用于探索化合物、蛋白质序列、文献和检测数据之间关系的集体功能有了显著的进步。与此同时,从期刊和专利中提取和整理这些关系的商业资源也在不断扩大。这项工作更新了之前对数据库的比较研究,这些数据库是因为它们的生物活性内容、下载的可用性和选择信息子集的便利性而被选中的。

结果

在可以计算的情况下,每篇期刊文章中的提取化合物数量在 12 到 19 之间,但化合物与蛋白质数量的比值随着文献数量的增加而增加。为了便于标准化比较而进行的化学结构过滤通常会使源计数减少 5%到 30%。确定了 23 个数据库和子集之间的两两重叠,以及 2006 年和 2008 年之间的变化。虽然所有化合物集都有所增加,但 PubChem 已经翻了一番,达到 1420 万。2008 年的比较矩阵不仅显示了所有来源之间的重叠,还显示了独特的内容。许多详细的差异可以归因于数据选择和提取的个别策略。虽然自 2006 年以来,专利衍生结构进入 PubChem 的数量大幅增加,但 GVKBIO 包含了超过 80 万种来自这一来源的独特结构。Venn 图显示了 GVKBIO、WOMBAT(均为商业)和 BindingDB(公共)从期刊中由独立专家提取的化合物之间有广泛的重叠,但每个都包含独特的内容。相比之下,GVKBIO、MDDR(商业)和 DrugBank(公共)的批准药物集显示出令人惊讶的低重叠。聚合所有商业来源表明,虽然有 100 万种化合物与 PubChem 1.2 重叠,但仍有 100 万种不重叠。

结论

基于化学结构内容本身,公共来源在过去两年中覆盖了越来越多的商业数据库。然而,本研究中包含的商业产品在更大的范围内提供了化合物与专利和期刊信息之间的联系,这超过了当前公共努力的范围。它们还继续捕获了很大一部分独特的内容。因此,我们的研究结果不仅展示了数据支持的生物活性化学空间的令人鼓舞的整体扩展,还表明商业和公共来源在探索该空间方面是互补的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/8beb56399242/1758-2946-1-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/c1e1004b9c66/1758-2946-1-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/cd7920f1ed04/1758-2946-1-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/8beb56399242/1758-2946-1-10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/c1e1004b9c66/1758-2946-1-10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/cd7920f1ed04/1758-2946-1-10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54f4/3225862/8beb56399242/1758-2946-1-10-3.jpg

相似文献

1
Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds.定量评估生物活性化合物公共数据库和商业数据库之间不断扩大的互补性。
J Cheminform. 2009 Jul 6;1(1):10. doi: 10.1186/1758-2946-1-10.
2
Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics.
Curr Top Med Chem. 2007;7(15):1502-8. doi: 10.2174/156802607782194761.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Extracting and connecting chemical structures from text sources using chemicalize.org.使用 chemicalize.org 从文本来源中提取和连接化学结构。
J Cheminform. 2013 Apr 23;5(1):20. doi: 10.1186/1758-2946-5-20.
5
Opening up connectivity between documents, structures and bioactivity.开启文档、结构与生物活性之间的连通性。
Beilstein J Org Chem. 2020 Apr 2;16:596-606. doi: 10.3762/bjoc.16.54. eCollection 2020.
6
Parallel worlds of public and commercial bioactive chemistry data.公共和商业生物活性化学数据的平行世界。
J Med Chem. 2015 Mar 12;58(5):2068-76. doi: 10.1021/jm5011308. Epub 2014 Dec 4.
7
Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents.管理预期:对通过从专利中自动提取化学结构生成的化学数据库的评估。
J Cheminform. 2015 Oct 6;7(1):49. doi: 10.1186/s13321-015-0097-z. eCollection 2015 Dec.
8
Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database.比较ChEMBL、DrugBank、人类代谢组数据库和治疗靶点数据库的化学结构和蛋白质含量。
Mol Inform. 2013 Dec;32(11-12):881-897. doi: 10.1002/minf.201300103. Epub 2013 Dec 11.
9
Expanding opportunities for mining bioactive chemistry from patents.拓展从专利中挖掘生物活性化学物质的机会。
Drug Discov Today Technol. 2015 Jul;14:3-9. doi: 10.1016/j.ddtec.2014.12.001. Epub 2015 Feb 11.
10
PubChem atom environments.美国国立医学图书馆化学数据库(PubChem)的原子环境
J Cheminform. 2015 Aug 19;7:41. doi: 10.1186/s13321-015-0076-4. eCollection 2015.

引用本文的文献

1
Illuminating the druggable genome through patent bioactivity data.通过专利生物活性数据揭示可成药性基因组。
PeerJ. 2023 May 2;11:e15153. doi: 10.7717/peerj.15153. eCollection 2023.
2
Extractable and Non-Extractable Antioxidants Composition in the eBASIS Database: A Key Tool for Dietary Assessment in Human Health and Disease Research.eBASIS 数据库中的可提取和不可提取抗氧化剂成分:人类健康和疾病研究中膳食评估的重要工具。
Nutrients. 2020 Nov 6;12(11):3405. doi: 10.3390/nu12113405.
3
Substance-Based Bibliometrics: Identifying Research Gaps by Counting and Analyzing Substances.

本文引用的文献

1
Cell biology, regulation and inhibition of beta-secretase (BACE-1).细胞生物学、β-分泌酶(BACE-1)的调控与抑制
FEBS J. 2009 Apr;276(7):1845-59. doi: 10.1111/j.1742-4658.2009.06929.x.
2
A perspective of publicly accessible/open-access chemistry databases.关于可公开访问/开放获取化学数据库的概述。
Drug Discov Today. 2008 Jun;13(11-12):495-501. doi: 10.1016/j.drudis.2008.03.017. Epub 2008 May 15.
3
Public chemical compound databases.公共化合物数据库。
基于物质的文献计量学:通过计数和分析物质来识别研究空白。
ACS Omega. 2019 Jan 2;4(1):86-94. doi: 10.1021/acsomega.8b02201. eCollection 2019 Jan 31.
4
A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications.一种用于建模应用的化学数据检索和质量检查的新型半自动工作流程。
J Cheminform. 2018 Dec 10;10(1):60. doi: 10.1186/s13321-018-0315-6.
5
QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery.基于定量构效关系的虚拟筛选:药物发现中的进展与应用
Front Pharmacol. 2018 Nov 13;9:1275. doi: 10.3389/fphar.2018.01275. eCollection 2018.
6
Caveat Usor: Assessing Differences between Major Chemistry Databases.警示用户:评估主要化学数据库之间的差异。
ChemMedChem. 2018 Mar 20;13(6):470-481. doi: 10.1002/cmdc.201700724. Epub 2018 Feb 23.
7
Assessment of the significance of patent-derived information for the early identification of compound-target interaction hypotheses.评估专利衍生信息对早期识别化合物-靶点相互作用假设的重要性。
J Cheminform. 2017 Apr 21;9(1):26. doi: 10.1186/s13321-017-0214-2.
8
VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization.VB-MK-LMF:使用变分贝叶斯多核逻辑矩阵分解融合药物、靶点及相互作用
BMC Bioinformatics. 2017 Oct 4;18(1):440. doi: 10.1186/s12859-017-1845-z.
9
Evidence-Based Precision Oncology with the Cancer Targetome.基于证据的精准肿瘤学与癌症靶点组学
Trends Pharmacol Sci. 2017 Dec;38(12):1085-1099. doi: 10.1016/j.tips.2017.08.006. Epub 2017 Sep 27.
10
The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands.《2016年IUPHAR/BPS药理学指南:迈向1300个蛋白质靶点与6000种配体之间的精准定量相互作用》
Nucleic Acids Res. 2016 Jan 4;44(D1):D1054-68. doi: 10.1093/nar/gkv1037. Epub 2015 Oct 12.
Curr Opin Drug Discov Devel. 2008 May;11(3):393-404.
4
Chemistry for everyone.面向大众的化学。
Nature. 2008 Feb 7;451(7179):648-51. doi: 10.1038/451648a.
5
2007 FDA drug approvals: a year of flux.2007年美国食品药品监督管理局的药品批准情况:变动的一年。
Nat Rev Drug Discov. 2008 Feb;7(2):107-9. doi: 10.1038/nrd2514.
6
KEGG for linking genomes to life and the environment.京都基因与基因组百科全书,用于将基因组与生命及环境相联系。
Nucleic Acids Res. 2008 Jan;36(Database issue):D480-4. doi: 10.1093/nar/gkm882. Epub 2007 Dec 12.
7
DrugBank: a knowledgebase for drugs, drug actions and drug targets.药物银行:一个关于药物、药物作用和药物靶点的知识库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D901-6. doi: 10.1093/nar/gkm958. Epub 2007 Nov 29.
8
ChEBI: a database and ontology for chemical entities of biological interest.ChEBI:一个关于具有生物学意义的化学实体的数据库和本体。
Nucleic Acids Res. 2008 Jan;36(Database issue):D344-50. doi: 10.1093/nar/gkm791. Epub 2007 Oct 11.
9
Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics.
Curr Top Med Chem. 2007;7(15):1502-8. doi: 10.2174/156802607782194761.
10
Designing drugs on the internet? Free web tools and services supporting medicinal chemistry.在互联网上设计药物?支持药物化学的免费网络工具和服务。
Curr Top Med Chem. 2007;7(15):1491-501. doi: 10.2174/156802607782194707.