文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Isazawa Taketomo, Cole Jacqueline M

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

J Chem Inf Model. 2024 Apr 22;64(8):3205-3212. doi: 10.1021/acs.jcim.4c00063. Epub 2024 Mar 27.

Language models trained on domain-specific corpora have been employed to increase the performance in specialized tasks. However, little previous work has been reported on how specific a "domain-specific" corpus should be. Here, we test a number of language models trained on varyingly specific corpora by employing them in the task of extracting information from photocatalytic water splitting. We find that more specific corpora can benefit performance on downstream tasks. Furthermore, PhotocatalysisBERT, a pretrained model from scratch on scientific papers on photocatalytic water splitting, demonstrates improved performance over previous work in associating the correct photocatalyst with the correct photocatalytic activity during information extraction, achieving a precision of 60.8(+11.5)% and a recall of 37.2(+4.5)%.

在特定领域语料库上训练的语言模型已被用于提高特定任务的性能。然而，此前关于“特定领域”语料库应多具体的研究报道较少。在此，我们通过将多个在不同具体程度的语料库上训练的语言模型用于从光催化水分解中提取信息的任务来进行测试。我们发现，更具体的语料库有助于提升下游任务的性能。此外，PhotocatalysisBERT，一个基于光催化水分解科学论文从头开始预训练的模型，在信息提取过程中将正确的光催化剂与正确的光催化活性相关联方面，表现优于此前的工作，精确率达到60.8(+11.5)%，召回率达到37.2(+4.5)%。

Isazawa Taketomo, Cole Jacqueline M

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

J Chem Inf Model. 2024 Apr 22;64(8):3205-3212. doi: 10.1021/acs.jcim.4c00063. Epub 2024 Mar 27.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

针对光催化水分解信息提取，在特定领域的狭窄语料库上进行预训练有多大益处？

How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

针对光催化水分解信息提取，在特定领域的狭窄语料库上进行预训练有多大益处？

How Beneficial Is Pretraining on a Narrow Domain-Specific Corpus for Information Extraction about Photocatalytic Water Splitting?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献