• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从文本到洞察:用于化学数据提取的大语言模型

From text to insight: large language models for chemical data extraction.

作者信息

Schilling-Wilhelmi Mara, Ríos-García Martiño, Shabih Sherjeel, Gil María Victoria, Miret Santiago, Koch Christoph T, Márquez José A, Jablonka Kevin Maik

机构信息

Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany.

Institute of Carbon Science and Technology (INCAR), CSIC, Francisco Pintado Fe 26, 33011 Oviedo, Spain.

出版信息

Chem Soc Rev. 2025 Feb 3;54(3):1125-1150. doi: 10.1039/d4cs00913d.

DOI:10.1039/d4cs00913d
PMID:39703015
Abstract

The vast majority of chemical knowledge exists in unstructured natural language, yet structured data is crucial for innovative and systematic materials design. Traditionally, the field has relied on manual curation and partial automation for data extraction for specific use cases. The advent of large language models (LLMs) represents a significant shift, potentially enabling non-experts to extract structured, actionable data from unstructured text efficiently. While applying LLMs to chemical and materials science data extraction presents unique challenges, domain knowledge offers opportunities to guide and validate LLM outputs. This tutorial review provides a comprehensive overview of LLM-based structured data extraction in chemistry, synthesizing current knowledge and outlining future directions. We address the lack of standardized guidelines and present frameworks for leveraging the synergy between LLMs and chemical expertise. This work serves as a foundational resource for researchers aiming to harness LLMs for data-driven chemical research. The insights presented here could significantly enhance how researchers across chemical disciplines access and utilize scientific information, potentially accelerating the development of novel compounds and materials for critical societal needs.

摘要

绝大多数化学知识以非结构化的自然语言存在,但结构化数据对于创新和系统的材料设计至关重要。传统上,该领域依赖人工整理和针对特定用例的数据提取部分自动化。大语言模型(LLMs)的出现代表了一个重大转变,有可能使非专家能够高效地从非结构化文本中提取结构化的、可操作的数据。虽然将大语言模型应用于化学和材料科学数据提取存在独特挑战,但领域知识提供了指导和验证大语言模型输出的机会。本教程综述全面概述了基于大语言模型的化学结构化数据提取,综合了当前知识并概述了未来方向。我们解决了缺乏标准化指南的问题,并提出了利用大语言模型与化学专业知识协同作用的框架。这项工作为旨在利用大语言模型进行数据驱动化学研究的研究人员提供了基础资源。这里提出的见解可能会显著改善化学各学科研究人员获取和利用科学信息的方式,有可能加速开发满足关键社会需求的新型化合物和材料。

相似文献

1
From text to insight: large language models for chemical data extraction.从文本到洞察:用于化学数据提取的大语言模型
Chem Soc Rev. 2025 Feb 3;54(3):1125-1150. doi: 10.1039/d4cs00913d.
2
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
3
LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models.LLM-AIx:一种基于隐私保护大语言模型从非结构化医学文本中提取信息的开源管道。
medRxiv. 2024 Sep 3:2024.09.02.24312917. doi: 10.1101/2024.09.02.24312917.
4
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
5
Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.在卫生经济学与结果研究中使用生成式人工智能:技术与突破入门
Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.
6
Large language models for conducting systematic reviews: on the rise, but not yet ready for use-a scoping review.用于进行系统评价的大型语言模型:正在兴起,但尚未准备好投入使用——一项范围综述
J Clin Epidemiol. 2025 May;181:111746. doi: 10.1016/j.jclinepi.2025.111746. Epub 2025 Feb 26.
7
An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统:开发研究
JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.
8
Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.大语言模型可通过单一提示实现社交媒体语料库的归纳主题分析:人类验证研究。
JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.
9
Evaluating large language models for health-related text classification tasks with public social media data.利用公共社交媒体数据评估用于健康相关文本分类任务的大型语言模型。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.
10
Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用:范围综述。
JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

引用本文的文献

1
Finding the dark matter: Large language model-based enzyme kinetic data extractor and its validation.寻找暗物质:基于大语言模型的酶动力学数据提取器及其验证
Protein Sci. 2025 Sep;34(9):e70251. doi: 10.1002/pro.70251.
2
Chemistries Moonshot: An Entirely Recyclable Car.化学登月计划:一辆完全可回收的汽车。
ACS Cent Sci. 2025 Jul 2;11(7):1052-1061. doi: 10.1021/acscentsci.5c00589. eCollection 2025 Jul 23.
3
Implementation of an open chemistry knowledge base with a Semantic Wiki.使用语义维基实现一个开放化学知识库。
J Cheminform. 2025 Jul 6;17(1):99. doi: 10.1186/s13321-025-01037-w.
4
Artificial Intelligence Paradigms for Next-Generation Metal-Organic Framework Research.面向下一代金属有机框架研究的人工智能范式
J Am Chem Soc. 2025 Jul 9;147(27):23367-23380. doi: 10.1021/jacs.5c08214. Epub 2025 Jun 24.
5
NMRExtractor: leveraging large language models to construct an experimental NMR database from open-source scientific publications.NMRExtractor:利用大语言模型从开源科学出版物构建实验性核磁共振数据库。
Chem Sci. 2025 May 28. doi: 10.1039/d4sc08802f.
6
A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists.一个根据化学家的专业知识来评估大语言模型化学知识和推理能力的框架。
Nat Chem. 2025 May 20. doi: 10.1038/s41557-025-01815-x.
7
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
8
A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。
Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.