Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.
ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.
J Chem Inf Model. 2022 Mar 14;62(5):1207-1213. doi: 10.1021/acs.jcim.1c01199. Epub 2022 Feb 24.
Chemical Named Entity Recognition (NER) forms the basis of information extraction tasks in the chemical domain. However, while such tasks can involve multiple domains of chemistry at the same time, currently available named entity recognizers are specialized in one part of chemistry, resulting in such workflows failing for a biased subset of mentions. This paper presents a single model that performs at close to the state-of-the-art for organic (CHEMDNER, 89.7 F1 score) and inorganic (Matscholar, 88.0 F1 score) NER tasks at the same time. Our NER system utilizing the Bert architecture is available as part of ChemDataExtractor 2.1, along with the data sets and scripts used to train the model.
化学命名实体识别 (NER) 是化学领域信息提取任务的基础。然而,虽然这些任务可能同时涉及多个化学领域,但目前可用的命名实体识别器专门针对化学的一部分,导致这种工作流无法涵盖有偏差的提及。本文提出了一个单一的模型,在有机化学(CHEMDNER,89.7 F1 得分)和无机化学(Matscholar,88.0 F1 得分)的 NER 任务上同时接近最先进的水平。我们的使用 Bert 架构的 NER 系统作为 ChemDataExtractor 2.1 的一部分提供,同时提供用于训练模型的数据和脚本。