• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型挖掘专利可阐明化学功能格局。

Mining patents with large language models elucidates the chemical function landscape.

作者信息

Kosonocky Clayton W, Wilke Claus O, Marcotte Edward M, Ellington Andrew D

机构信息

Department of Molecular Biosciences, University of Texas at Austin Austin TX 78705 USA.

Department of Integrative Biology, University of Texas at Austin Austin TX 78705 USA.

出版信息

Digit Discov. 2024 May 7;3(6):1150-1159. doi: 10.1039/d4dd00011k. eCollection 2024 Jun 12.

DOI:10.1039/d4dd00011k
PMID:38873033
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11167698/
Abstract

The fundamental goal of small molecule discovery is to generate chemicals with target functionality. While this often proceeds through structure-based methods, we set out to investigate the practicality of methods that leverage the extensive corpus of chemical literature. We hypothesize that a sufficiently large text-derived chemical function dataset would mirror the actual landscape of chemical functionality. Such a landscape would implicitly capture complex physical and biological interactions given that chemical function arises from both a molecule's structure and its interacting partners. To evaluate this hypothesis, we built a Chemical Function (CheF) dataset of patent-derived functional labels. This dataset, comprising 631 K molecule-function pairs, was created using an LLM- and embedding-based method to obtain 1.5 K unique functional labels for approximately 100 K randomly selected molecules from their corresponding 188 K unique patents. We carry out a series of analyses demonstrating that the CheF dataset contains a semantically coherent textual representation of the functional landscape congruent with chemical structural relationships, thus approximating the actual chemical function landscape. We then demonstrate through several examples that this text-based functional landscape can be leveraged to identify drugs with target functionality using a model able to predict functional profiles from structure alone. We believe that functional label-guided molecular discovery may serve as an alternative approach to traditional structure-based methods in the pursuit of designing novel functional molecules.

摘要

小分子发现的基本目标是生成具有目标功能的化学物质。虽然这通常通过基于结构的方法来进行,但我们着手研究利用大量化学文献库的方法的实用性。我们假设,一个足够大的文本衍生化学功能数据集将反映化学功能的实际情况。鉴于化学功能源于分子结构及其相互作用伙伴,这样的情况将隐含地捕捉复杂的物理和生物相互作用。为了评估这一假设,我们构建了一个基于专利衍生功能标签的化学功能(CheF)数据集。这个数据集包含63.1万个分子-功能对,是使用基于大语言模型和嵌入的方法创建的,从相应的18.8万项独特专利中为大约10万个随机选择的分子获得了1500个独特的功能标签。我们进行了一系列分析,证明CheF数据集包含与化学结构关系一致的功能情况的语义连贯文本表示,从而近似于实际的化学功能情况。然后,我们通过几个例子证明,这种基于文本的功能情况可用于利用一个仅根据结构就能预测功能概况的模型来识别具有目标功能的药物。我们相信,功能标签引导的分子发现可能成为传统基于结构的方法的一种替代方法,用于设计新型功能分子。

相似文献

1
Mining patents with large language models elucidates the chemical function landscape.利用大语言模型挖掘专利可阐明化学功能格局。
Digit Discov. 2024 May 7;3(6):1150-1159. doi: 10.1039/d4dd00011k. eCollection 2024 Jun 12.
2
Mining Patents with Large Language Models Elucidates the Chemical Function Landscape.利用大语言模型挖掘专利阐明化学功能格局。
ArXiv. 2023 Dec 18:arXiv:2309.08765v2.
3
Enabling Systemic Identification and Functionality Profiling for Cdc42 Homeostatic Modulators.实现对Cdc42稳态调节剂的系统鉴定和功能分析。
bioRxiv. 2024 Jan 8:2024.01.05.574351. doi: 10.1101/2024.01.05.574351.
4
ChemTables: a dataset for semantic classification on tables in chemical patents.化学表格:一个用于化学专利表格语义分类的数据集。
J Cheminform. 2021 Dec 11;13(1):97. doi: 10.1186/s13321-021-00568-2.
5
Annotated chemical patent corpus: a gold standard for text mining.带注释的化学专利语料库:文本挖掘的黄金标准。
PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.
6
Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches.在化学相似性向量搜索中使用替代的SMILES表示法来识别新型功能类似物。
Patterns (N Y). 2023 Oct 30;4(12):100865. doi: 10.1016/j.patter.2023.100865. eCollection 2023 Dec 8.
7
Exploring sets of molecules from patents and relationships to other active compounds in chemical space networks.探索来自专利的分子集合以及在化学空间网络中与其他活性化合物的关系。
J Comput Aided Mol Des. 2017 Sep;31(9):779-788. doi: 10.1007/s10822-017-0061-2. Epub 2017 Sep 4.
8
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
9
Mining patents using molecular similarity search.使用分子相似性搜索挖掘专利。
Pac Symp Biocomput. 2007:304-15.
10
Deep learning-driven prediction of drug mechanism of action from large-scale chemical-genetic interaction profiles.基于大规模化学-基因相互作用图谱的深度学习驱动的药物作用机制预测。
J Cheminform. 2022 Mar 12;14(1):12. doi: 10.1186/s13321-022-00596-6.

引用本文的文献

1
Revealing Chemical Trends: Insights from Data-Driven Visualization and Patent Analysis in Exposomics Research.揭示化学趋势:暴露组学研究中数据驱动可视化和专利分析的见解
Environ Sci Technol Lett. 2024 Aug 30;11(10):1046-1052. doi: 10.1021/acs.estlett.4c00560. eCollection 2024 Oct 8.

本文引用的文献

1
Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches.在化学相似性向量搜索中使用替代的SMILES表示法来识别新型功能类似物。
Patterns (N Y). 2023 Oct 30;4(12):100865. doi: 10.1016/j.patter.2023.100865. eCollection 2023 Dec 8.
2
ChemFOnt: the chemical functional ontology resource.ChemFOnt:化学功能本体资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1220-D1229. doi: 10.1093/nar/gkac919.
3
Hepatitis B virus and hepatitis C virus reactivation in cancer patients receiving novel anticancer therapies.
癌症患者在接受新型抗癌疗法时乙型肝炎病毒和丙型肝炎病毒的再激活。
Clin Microbiol Infect. 2022 Oct;28(10):1321-1327. doi: 10.1016/j.cmi.2022.02.042. Epub 2022 Mar 10.
4
Open Targets Platform: supporting systematic drug-target identification and prioritisation.Open Targets 平台:支持系统性药物靶点识别和优先级排序。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1302-D1310. doi: 10.1093/nar/gkaa1027.
5
Mechanisms of Action for Small Molecules Revealed by Structural Biology in Drug Discovery.结构生物学在药物发现中揭示小分子的作用机制。
Int J Mol Sci. 2020 Jul 24;21(15):5262. doi: 10.3390/ijms21155262.
6
MoleculeNet: a benchmark for molecular machine learning.分子网络:分子机器学习的一个基准
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
7
Assessment of the significance of patent-derived information for the early identification of compound-target interaction hypotheses.评估专利衍生信息对早期识别化合物-靶点相互作用假设的重要性。
J Cheminform. 2017 Apr 21;9(1):26. doi: 10.1186/s13321-017-0214-2.
8
Innovation in Small-Molecule-Druggable Chemical Space: Where are the Initial Modulators of New Targets Published?小分子可成药化学空间中的创新:新靶点的初始调节剂发表于何处?
J Chem Inf Model. 2017 Nov 27;57(11):2741-2753. doi: 10.1021/acs.jcim.7b00295. Epub 2017 Nov 14.
9
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
10
SureChEMBL: a large-scale, chemically annotated patent document database.SureChEMBL:一个大规模的、经过化学注释的专利文献数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1220-8. doi: 10.1093/nar/gkv1253. Epub 2015 Nov 17.