Suppr超能文献

自动化构建用于水分解应用的光催化数据集。

Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.

出版信息

Sci Data. 2023 Sep 22;10(1):651. doi: 10.1038/s41597-023-02511-6.

Abstract

We present an automatically generated dataset of 15,755 records that were extracted from 47,357 papers. These records contain water-splitting activity in the presence of certain photocatalysts, along with additional information about the chemical reaction conditions under which this activity was recorded. These conditions include any co-catalysts and additives that were present during water splitting, the length of time for which the photocatalytic experiment was conducted, and the type of light source used, including its wavelength. Despite the text extraction of such a wide range of chemical reaction attributes, the dataset afforded good precision (71.2%) and recall (36.3%). These figures-of-merit were calculated based on a random sample of open-access papers from the corpus. Mining such a complex set of attributes required the development of novel techniques in knowledge extraction and interdependency resolution, leveraging inter- and intra-sentence relations, which are also described in this paper. We present a new version (version 2.2) of the chemistry-aware text-mining toolkit ChemDataExtractor, in which these new techniques are included.

摘要

我们提供了一个自动生成的数据集,其中包含 15755 条记录,这些记录是从 47357 篇论文中提取出来的。这些记录包含在某些光催化剂存在的情况下进行水分解的活动,以及有关记录该活性的化学反应条件的其他信息。这些条件包括在水分解过程中存在的任何共催化剂和添加剂、进行光催化实验的时间长度以及所使用的光源类型,包括其波长。尽管从如此广泛的化学反应属性中提取了文本,但该数据集具有良好的精度(71.2%)和召回率(36.3%)。这些衡量标准是基于语料库中开放获取论文的随机样本计算得出的。挖掘如此复杂的属性集需要开发新的知识提取和相互依存关系解析技术,利用句子间和句子内的关系,本文也对此进行了描述。我们在 ChemDataExtractor 化学感知文本挖掘工具包中提供了一个新版本(版本 2.2),其中包含了这些新技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b25/10517137/391a799e3a4e/41597_2023_2511_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验