• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开启文档、结构与生物活性之间的连通性。

Opening up connectivity between documents, structures and bioactivity.

作者信息

Southan Christopher

机构信息

Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK.

TW2Informatics Ltd, Västra Frölunda, Gothenburg, 42166, Sweden.

出版信息

Beilstein J Org Chem. 2020 Apr 2;16:596-606. doi: 10.3762/bjoc.16.54. eCollection 2020.

DOI:10.3762/bjoc.16.54
PMID:32280387
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7136548/
Abstract

Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.

摘要

阅读论文或专利的生物科学家们努力在一份文献“D”中识别所报告的关键关系,在该文献中,针对调节(如抑制)蛋白质靶点“P”的化学结构“C”,报告了具有定量结果“R”(如IC)的生物活性“A”。因此,这种关联关系的一种有用的简写形式就变成了DARCP。本文核心的问题是,几十年来,科学界实际上已经花费了数百万资金将这些关系埋没在PDF文件中,但现在又必须花费数百万资金试图将它们重新挖掘出来。对此的关键要求是增加流入结构化开放数据库的数据量。积极影响将包括为药物发现和化学生物学带来更多的数据挖掘机会。在过去十年中,商业机构已从约30万份文献中人工提取了DARCP,这些文献涵盖了约700万种与约10000个靶点相互作用的化合物。在类似的时间段内,《药理学指南》《BindingDB》和《ChEMBL》也进行了类似的DARCP提取。尽管它们经专家整理的数据量较少(即约200万种化合物针对约3700种人类蛋白质),但这些开源数据具有可在PubChem中合并的巨大优势。并行的努力集中在文献与化合物(仅D-C)关联关系的提取上。在缺乏作用分子机制(mmoa)注释的情况下,这一价值较小,但可以自动提取。对于专利,这一点已经有了显著成果(例如由IBM、SureChEMBL和世界知识产权组织完成),涉及PubChem中的3000多万种化合物。最近,来自三大化学出版商的140万条D-C提交数据也加入进来。此外,欧洲和美国的PubMed Central门户现在也增加了从摘要和全文论文中进行化学查询的功能。然而,DARCLP的全自动提取尚未实现。这与生物编目人员在几分钟内就能识别这些关系的能力形成了对比。不幸的是,目前还没有期刊促使作者指定的DARCP直接流入开放数据库。进展可能来自开放科学、开放获取(OA)、可查找、可访问、可互操作和可重用(FAIR)、资源描述框架(RDF)和维基数据等趋势。然而,我们需要等待关于DARCP捕获的技术适用性,看看这是否能开启关联关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/360e7ddb5874/Beilstein_J_Org_Chem-16-596-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/adf389acec78/Beilstein_J_Org_Chem-16-596-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/9936fc843fd3/Beilstein_J_Org_Chem-16-596-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/708e2a228367/Beilstein_J_Org_Chem-16-596-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/56ae0c49681d/Beilstein_J_Org_Chem-16-596-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/360e7ddb5874/Beilstein_J_Org_Chem-16-596-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/adf389acec78/Beilstein_J_Org_Chem-16-596-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/9936fc843fd3/Beilstein_J_Org_Chem-16-596-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/708e2a228367/Beilstein_J_Org_Chem-16-596-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/56ae0c49681d/Beilstein_J_Org_Chem-16-596-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0f1/7136548/360e7ddb5874/Beilstein_J_Org_Chem-16-596-g006.jpg

相似文献

1
Opening up connectivity between documents, structures and bioactivity.开启文档、结构与生物活性之间的连通性。
Beilstein J Org Chem. 2020 Apr 2;16:596-606. doi: 10.3762/bjoc.16.54. eCollection 2020.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Extracting and connecting chemical structures from text sources using chemicalize.org.使用 chemicalize.org 从文本来源中提取和连接化学结构。
J Cheminform. 2013 Apr 23;5(1):20. doi: 10.1186/1758-2946-5-20.
4
Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents.管理预期:对通过从专利中自动提取化学结构生成的化学数据库的评估。
J Cheminform. 2015 Oct 6;7(1):49. doi: 10.1186/s13321-015-0097-z. eCollection 2015 Dec.
5
SureChEMBL: a large-scale, chemically annotated patent document database.SureChEMBL:一个大规模的、经过化学注释的专利文献数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1220-8. doi: 10.1093/nar/gkv1253. Epub 2015 Nov 17.
6
Planning Implications Related to Sterilization-Sensitive Science Investigations Associated with Mars Sample Return (MSR).与火星样本返回(MSR)相关的对灭菌敏感的科学研究的规划意义。
Astrobiology. 2022 Jun;22(S1):S112-S164. doi: 10.1089/AST.2021.0113. Epub 2022 May 19.
7
Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds.定量评估生物活性化合物公共数据库和商业数据库之间不断扩大的互补性。
J Cheminform. 2009 Jul 6;1(1):10. doi: 10.1186/1758-2946-1-10.
8
Illuminating the druggable genome through patent bioactivity data.通过专利生物活性数据揭示可成药性基因组。
PeerJ. 2023 May 2;11:e15153. doi: 10.7717/peerj.15153. eCollection 2023.
9
Identification of the Core Chemical Structure in SureChEMBL Patents.SureChEMBL 专利核心化学结构的鉴定。
J Chem Inf Model. 2021 May 24;61(5):2241-2247. doi: 10.1021/acs.jcim.1c00151. Epub 2021 Apr 30.
10
The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery.《天然产物图谱:微生物天然产物发现的开放获取知识库》
ACS Cent Sci. 2019 Nov 27;5(11):1824-1833. doi: 10.1021/acscentsci.9b00806. Epub 2019 Nov 14.

引用本文的文献

1
Fifteen years of ChEMBL and its role in cheminformatics and drug discovery.ChEMBL的十五年及其在化学信息学和药物发现中的作用。
J Cheminform. 2025 Mar 10;17(1):32. doi: 10.1186/s13321-025-00963-z.
2
Will the chemical probes please stand up?化学探针请站起来好吗?
RSC Med Chem. 2021 Jul 16;12(8):1428-1441. doi: 10.1039/d1md00138h. eCollection 2021 Aug 18.

本文引用的文献

1
The IUPHAR/BPS Guide to PHARMACOLOGY in 2020: extending immunopharmacology content and introducing the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY.2020 年国际药理学联合会/英国药理学学会药物学指南:扩充免疫药理学内容并推出国际药理学联合会/MMV 疟疾药物学指南。
Nucleic Acids Res. 2020 Jan 8;48(D1):D1006-D1021. doi: 10.1093/nar/gkz951.
2
Recent Changes in the Scaffold Diversity of Organic Chemistry As Seen in the CAS Registry.近期 CAS 登记处中有机化学支架多样性的变化。
J Org Chem. 2019 Nov 1;84(21):13948-13956. doi: 10.1021/acs.joc.9b02111. Epub 2019 Oct 23.
3
Database resources of the National Center for Biotechnology Information.
国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2020 Jan 8;48(D1):D9-D16. doi: 10.1093/nar/gkz899.
4
Open notebook science can maximize impact for rare disease projects.开放笔记本科学可以最大限度地提高罕见病项目的影响力。
PLoS Biol. 2019 Jan 28;17(1):e3000120. doi: 10.1371/journal.pbio.3000120. eCollection 2019 Jan.
5
ChEMBL: towards direct deposition of bioassay data.ChEMBL:致力于直接生成生物测定数据。
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940. doi: 10.1093/nar/gky1075.
6
UniProt: a worldwide hub of protein knowledge.UniProt:蛋白质知识的全球枢纽。
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515. doi: 10.1093/nar/gky1049.
7
Biocuration: Distilling data into knowledge.生物信息学数据管理:从数据中提取知识。
PLoS Biol. 2018 Apr 16;16(4):e2002846. doi: 10.1371/journal.pbio.2002846. eCollection 2018 Apr.
8
The international nucleotide sequence database collaboration.国际核苷酸序列数据库合作组织。
Nucleic Acids Res. 2018 Jan 4;46(D1):D48-D51. doi: 10.1093/nar/gkx1097.
9
Europe PMC in 2017.欧洲 PMC 于 2017 年。
Nucleic Acids Res. 2018 Jan 4;46(D1):D1254-D1260. doi: 10.1093/nar/gkx1005.
10
Malaria.疟疾。
Nat Rev Dis Primers. 2017 Aug 3;3:17050. doi: 10.1038/nrdp.2017.50.