自动化构建用于水分解应用的光催化数据集。

Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.

出版信息

Sci Data. 2023 Sep 22;10(1):651. doi: 10.1038/s41597-023-02511-6.

DOI:10.1038/s41597-023-02511-6

PMID:37739960

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10517137/

Abstract

We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.

摘要

我们提供了一个自动生成的数据集，其中包含 15755 条记录，这些记录是从 47357 篇论文中提取出来的。这些记录包含在某些光催化剂存在的情况下进行水分解的活动，以及有关记录该活性的化学反应条件的其他信息。这些条件包括在水分解过程中存在的任何共催化剂和添加剂、进行光催化实验的时间长度以及所使用的光源类型，包括其波长。尽管从如此广泛的化学反应属性中提取了文本，但该数据集具有良好的精度（71.2%）和召回率（36.3%）。这些衡量标准是基于语料库中开放获取论文的随机样本计算得出的。挖掘如此复杂的属性集需要开发新的知识提取和相互依存关系解析技术，利用句子间和句子内的关系，本文也对此进行了描述。我们在 ChemDataExtractor 化学感知文本挖掘工具包中提供了一个新版本（版本 2.2），其中包含了这些新技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b25/10517137/391a799e3a4e/41597_2023_2511_Fig1_HTML.jpg

相似文献

Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.自动化构建用于水分解应用的光催化数据集。

Sci Data. 2023 Sep 22;10(1):651. doi: 10.1038/s41597-023-02511-6.

A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor.利用 ChemDataExtractor 从科学文献中自动生成热激活延迟荧光分子数据库。

Sci Data. 2024 Jan 17;11(1):80. doi: 10.1038/s41597-023-02897-3.

ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor：一个用于从科学文献中自动提取化学信息的工具包。

J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.

Polymeric Carbon Nitride-Derived Photocatalysts for Water Splitting and Nitrogen Fixation.聚合物氮化碳衍生光催化剂用于水分解和氮气固定。

Small. 2021 Apr;17(13):e2005149. doi: 10.1002/smll.202005149. Epub 2021 Mar 9.

Snowball 2.0: Generic Material Data Parser for ChemDataExtractor.雪球 2.0：ChemDataExtractor 的通用物质数据解析器。

J Chem Inf Model. 2023 Nov 27;63(22):7045-7055. doi: 10.1021/acs.jcim.3c01281. Epub 2023 Nov 7.

A database of battery materials auto-generated using ChemDataExtractor.使用 ChemDataExtractor 自动生成的电池材料数据库。

Sci Data. 2020 Aug 6;7(1):260. doi: 10.1038/s41597-020-00602-2.

Auto-generated database of semiconductor band gaps using ChemDataExtractor.使用 ChemDataExtractor 自动生成半导体带隙数据库。

Sci Data. 2022 May 3;9(1):193. doi: 10.1038/s41597-022-01294-6.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science.ChemDataExtractor 2.0：材料科学自动填充本体。

J Chem Inf Model. 2021 Sep 27;61(9):4280-4289. doi: 10.1021/acs.jcim.1c00446. Epub 2021 Sep 16.

[Construction of chemical information database based on optical structure recognition technique].基于光学结构识别技术的化学信息数据库构建

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):352-357.

引用本文的文献

Autogenerating a Domain-Specific Question-Answering Data Set from a Thermoelectric Materials Database to Enable High-Performing BERT Models.从热电材料数据库自动生成特定领域的问答数据集以启用高性能的BERT模型。

J Chem Inf Model. 2025 Aug 25;65(16):8579-8592. doi: 10.1021/acs.jcim.5c00840. Epub 2025 Aug 7.

Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.用于光电子应用的语言模型的经济高效领域自适应预训练

J Chem Inf Model. 2025 Mar 10;65(5):2476-2486. doi: 10.1021/acs.jcim.4c02029. Epub 2025 Feb 11.

MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain.MechBERT：用于提取关于机械应力和应变的化学与性质关系的语言模型。

J Chem Inf Model. 2025 Feb 24;65(4):1873-1888. doi: 10.1021/acs.jcim.4c00857. Epub 2025 Jan 31.

How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting?针对光催化水分解信息提取，在特定领域的狭窄语料库上进行预训练有多大益处？

J Chem Inf Model. 2024 Apr 22;64(8):3205-3212. doi: 10.1021/acs.jcim.4c00063. Epub 2024 Mar 27.

本文引用的文献

A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor.使用 ChemDataExtractor 从科学文献中自动生成的热电材料数据库。

Sci Data. 2022 Oct 22;9(1):648. doi: 10.1038/s41597-022-01752-1.

Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery.机器学习在电催化剂和光催化剂设计与发现中的应用。

Chem Rev. 2022 Aug 24;122(16):13478-13515. doi: 10.1021/acs.chemrev.2c00061. Epub 2022 Jul 21.

One-step scalable synthesis of honeycomb-like g-CN with broad sub-bandgap absorption for superior visible-light-driven photocatalytic hydrogen evolution.一步法可扩展合成具有宽子带隙吸收的蜂窝状石墨相氮化碳用于高效可见光驱动光催化析氢

RSC Adv. 2019 Oct 14;9(56):32674-32682. doi: 10.1039/c9ra07068k. eCollection 2019 Oct 10.

Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor.在 ChemDataExtractor 中进行有机和无机化学命名实体识别的单一模型。

J Chem Inf Model. 2022 Mar 14;62(5):1207-1213. doi: 10.1021/acs.jcim.1c01199. Epub 2022 Feb 24.

ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science.ChemDataExtractor 2.0：材料科学自动填充本体。

J Chem Inf Model. 2021 Sep 27;61(9):4280-4289. doi: 10.1021/acs.jcim.1c00446. Epub 2021 Sep 16.

A database of battery materials auto-generated using ChemDataExtractor.使用 ChemDataExtractor 自动生成的电池材料数据库。

Sci Data. 2020 Aug 6;7(1):260. doi: 10.1038/s41597-020-00602-2.

Data-Driven Systematic Search of Promising Photocatalysts for Water Splitting under Visible Light.基于数据驱动的可见光下用于光解水的有前景光催化剂的系统搜索

J Phys Chem Lett. 2019 Sep 5;10(17):5211-5218. doi: 10.1021/acs.jpclett.9b01977. Epub 2019 Aug 23.

One-pot photoassisted synthesis, in situ photocatalytic testing for hydrogen generation and the mechanism of binary nitrogen and copper promoted titanium dioxide.一锅法光辅助合成、原位光催化产氢测试以及二元氮和铜促进二氧化钛的机理

Photochem Photobiol Sci. 2017 Jun 14;16(6):916-924. doi: 10.1039/c6pp00477f.

ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor：一个用于从科学文献中自动提取化学信息的工具包。

J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.

Nano-ferrites for water splitting: unprecedented high photocatalytic hydrogen production under visible light.用于水分解的纳米铁氧体：可见光下空前高的光催化制氢。

Nanoscale. 2012 Aug 21;4(16):5202-9. doi: 10.1039/c2nr30819c. Epub 2012 Jul 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自动化构建用于水分解应用的光催化数据集。

Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献