• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChemDataExtractor 2.0:材料科学自动填充本体。

ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.

出版信息

J Chem Inf Model. 2021 Sep 27;61(9):4280-4289. doi: 10.1021/acs.jcim.1c00446. Epub 2021 Sep 16.

DOI:10.1021/acs.jcim.1c00446
PMID:34529432
Abstract

The ever-growing abundance of data found in heterogeneous sources, such as scientific publications, has forced the development of automated techniques for data extraction. While in the past, in the physical sciences domain, the focus has been on the precise extraction of individual properties, attention has recently been devoted to the extraction of higher-level relationships. Here, we present a framework for an automated population of ontologies. That is, the direct extraction of a larger group of properties linked by a semantic network. We exploit data-rich sources, such as tables within documents, and present a new model concept that enables data extraction for chemical and physical properties with the ability to organize hierarchical data as nested information. Combining these capabilities with automatically generated parsers for data extraction and forward-looking interdependency resolution, we illustrate the power of our approach via the automatic extraction of a crystallographic hierarchy of information. This includes 18 interrelated submodels of nested data, extracted from an evaluation set of scientific articles, yielding an overall precision of 92.2%, across 26 different journals. Our method and associated toolkit, ChemDataExtractor 2.0, offers a key step toward the seamless integration of primary literature sources into a data-driven scientific framework.

摘要

不断增长的异质数据源(如科学出版物)中的数据丰富度,迫使人们开发自动化的数据提取技术。过去,物理科学领域的重点是精确提取单个属性,而最近的注意力则集中在提取更高层次的关系上。在这里,我们提出了一个用于本体自动填充的框架。也就是说,直接提取通过语义网络链接的更大属性组。我们利用富数据来源,如文档中的表格,并提出了一种新的模型概念,该概念能够以嵌套信息的形式组织分层数据,用于提取化学和物理性质的数据。通过自动生成用于数据提取和前瞻性依赖关系解析的解析器,结合这些功能,我们通过自动提取晶体学信息层次结构来说明我们方法的强大功能。这包括从科学文章评估集中提取的 18 个嵌套数据的相关子模型,在 26 种不同的期刊中,总体精度达到 92.2%。我们的方法和相关工具包 ChemDataExtractor 2.0 为无缝集成主要文献来源到数据驱动的科学框架提供了关键步骤。

相似文献

1
ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science.ChemDataExtractor 2.0:材料科学自动填充本体。
J Chem Inf Model. 2021 Sep 27;61(9):4280-4289. doi: 10.1021/acs.jcim.1c00446. Epub 2021 Sep 16.
2
Snowball 2.0: Generic Material Data Parser for ChemDataExtractor.雪球 2.0:ChemDataExtractor 的通用物质数据解析器。
J Chem Inf Model. 2023 Nov 27;63(22):7045-7055. doi: 10.1021/acs.jcim.3c01281. Epub 2023 Nov 7.
3
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor:一个用于从科学文献中自动提取化学信息的工具包。
J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.
4
PDFDataExtractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format.PDFDataExtractor:一种从可移植文档格式中的排版文献中读取科学文本和解释元数据的工具。
J Chem Inf Model. 2022 Apr 11;62(7):1633-1643. doi: 10.1021/acs.jcim.1c01198. Epub 2022 Mar 29.
5
Auto-generated database of semiconductor band gaps using ChemDataExtractor.使用 ChemDataExtractor 自动生成半导体带隙数据库。
Sci Data. 2022 May 3;9(1):193. doi: 10.1038/s41597-022-01294-6.
6
A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor.使用 ChemDataExtractor 从科学文献中自动生成的热电材料数据库。
Sci Data. 2022 Oct 22;9(1):648. doi: 10.1038/s41597-022-01752-1.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
9
Selective dissemination and indexing of scientific information.科学信息的选择性传播与索引编制
Science. 1971 Jul 23;173(3994):300-8. doi: 10.1126/science.173.3994.300.
10
[Construction of chemical information database based on optical structure recognition technique].基于光学结构识别技术的化学信息数据库构建
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):352-357.

引用本文的文献

1
Steering towards safe self-driving laboratories.转向安全的自动驾驶实验室。
Nat Rev Chem. 2025 Aug 18. doi: 10.1038/s41570-025-00747-x.
2
Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature.用于从文献中提取信息的钙钛矿带隙注释文本数据集PV600。
Sci Data. 2025 Aug 11;12(1):1401. doi: 10.1038/s41597-025-05637-x.
3
Autogenerating a Domain-Specific Question-Answering Data Set from a Thermoelectric Materials Database to Enable High-Performing BERT Models.从热电材料数据库自动生成特定领域的问答数据集以启用高性能的BERT模型。
J Chem Inf Model. 2025 Aug 25;65(16):8579-8592. doi: 10.1021/acs.jcim.5c00840. Epub 2025 Aug 7.
4
Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry.关于人工智能在化学领域潜力的跨学科观点。
Chem Soc Rev. 2025 Apr 25. doi: 10.1039/d5cs00146c.
5
Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.用于光电子应用的语言模型的经济高效领域自适应预训练
J Chem Inf Model. 2025 Mar 10;65(5):2476-2486. doi: 10.1021/acs.jcim.4c02029. Epub 2025 Feb 11.
6
MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain.MechBERT:用于提取关于机械应力和应变的化学与性质关系的语言模型。
J Chem Inf Model. 2025 Feb 24;65(4):1873-1888. doi: 10.1021/acs.jcim.4c00857. Epub 2025 Jan 31.
7
MaTableGPT: GPT-Based Table Data Extractor from Materials Science Literature.MaTableGPT:基于GPT的材料科学文献表格数据提取器。
Adv Sci (Weinh). 2025 Apr;12(16):e2408221. doi: 10.1002/advs.202408221. Epub 2025 Jan 24.
8
A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor.一个使用ChemDataExtractor从科学文献中自动生成的应力-应变特性数据库。
Sci Data. 2024 Nov 23;11(1):1273. doi: 10.1038/s41597-024-03979-6.
9
Automation and machine learning augmented by large language models in a catalysis study.在一项催化研究中,由大语言模型增强的自动化和机器学习。
Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.
10
Machine-Learning Prediction of Curie Temperature from Chemical Compositions of Ferromagnetic Materials.机器学习预测铁磁材料化学成分的居里温度。
J Chem Inf Model. 2024 Aug 26;64(16):6388-6409. doi: 10.1021/acs.jcim.4c00947. Epub 2024 Aug 7.