• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在 ChemDataExtractor 中进行有机和无机化学命名实体识别的单一模型。

Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

出版信息

J Chem Inf Model. 2022 Mar 14;62(5):1207-1213. doi: 10.1021/acs.jcim.1c01199. Epub 2022 Feb 24.

DOI:10.1021/acs.jcim.1c01199
PMID:35199519
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9049593/
Abstract

Chemical Named Entity Recognition (NER) forms the basis of information extraction tasks in the chemical domain. However, while such tasks can involve multiple domains of chemistry at the same time, currently available named entity recognizers are specialized in one part of chemistry, resulting in such workflows failing for a biased subset of mentions. This paper presents a single model that performs at close to the state-of-the-art for organic (CHEMDNER, 89.7 F1 score) and inorganic (Matscholar, 88.0 F1 score) NER tasks at the same time. Our NER system utilizing the Bert architecture is available as part of ChemDataExtractor 2.1, along with the data sets and scripts used to train the model.

摘要

化学命名实体识别 (NER) 是化学领域信息提取任务的基础。然而,虽然这些任务可能同时涉及多个化学领域,但目前可用的命名实体识别器专门针对化学的一部分,导致这种工作流无法涵盖有偏差的提及。本文提出了一个单一的模型,在有机化学(CHEMDNER,89.7 F1 得分)和无机化学(Matscholar,88.0 F1 得分)的 NER 任务上同时接近最先进的水平。我们的使用 Bert 架构的 NER 系统作为 ChemDataExtractor 2.1 的一部分提供,同时提供用于训练模型的数据和脚本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/31ae520dcf1d/ci1c01199_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/334fdececc39/ci1c01199_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/ac4164b7535a/ci1c01199_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/7f9ed06d814c/ci1c01199_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/3356d54a0d1f/ci1c01199_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/31ae520dcf1d/ci1c01199_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/334fdececc39/ci1c01199_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/ac4164b7535a/ci1c01199_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/7f9ed06d814c/ci1c01199_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/3356d54a0d1f/ci1c01199_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c44a/9049593/31ae520dcf1d/ci1c01199_0005.jpg

相似文献

1
Single Model for Organic and Inorganic Chemical Named Entity Recognition in ChemDataExtractor.在 ChemDataExtractor 中进行有机和无机化学命名实体识别的单一模型。
J Chem Inf Model. 2022 Mar 14;62(5):1207-1213. doi: 10.1021/acs.jcim.1c01199. Epub 2022 Feb 24.
2
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
3
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
4
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。
BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.
5
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
6
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
7
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor:一个用于从科学文献中自动提取化学信息的工具包。
J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.
8
S-NER: A Concise and Efficient Span-Based Model for Named Entity Recognition.S-NER:一种简洁高效的基于跨度的命名实体识别模型。
Sensors (Basel). 2022 Apr 8;22(8):2852. doi: 10.3390/s22082852.
9
UMLS-based data augmentation for natural language processing of clinical research literature.基于 UMLS 的临床研究文献自然语言处理的数据增强。
J Am Med Inform Assoc. 2021 Mar 18;28(4):812-823. doi: 10.1093/jamia/ocaa309.
10
DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool.DeIDNER 语料库:使用 BRAT 工具对命名实体识别的临床出院小结注释。
Stud Health Technol Inform. 2021 May 27;281:432-436. doi: 10.3233/SHTI210195.

引用本文的文献

1
Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.用于光电子应用的语言模型的经济高效领域自适应预训练
J Chem Inf Model. 2025 Mar 10;65(5):2476-2486. doi: 10.1021/acs.jcim.4c02029. Epub 2025 Feb 11.
2
MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain.MechBERT:用于提取关于机械应力和应变的化学与性质关系的语言模型。
J Chem Inf Model. 2025 Feb 24;65(4):1873-1888. doi: 10.1021/acs.jcim.4c00857. Epub 2025 Jan 31.
3
A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor.

本文引用的文献

1
ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science.ChemDataExtractor 2.0:材料科学自动填充本体。
J Chem Inf Model. 2021 Sep 27;61(9):4280-4289. doi: 10.1021/acs.jcim.1c00446. Epub 2021 Sep 16.
2
Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.命名实体识别和规范化在材料科学文献的大规模信息抽取中的应用。
J Chem Inf Model. 2019 Sep 23;59(9):3692-3702. doi: 10.1021/acs.jcim.9b00470. Epub 2019 Aug 19.
3
Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction.
一个使用ChemDataExtractor从科学文献中自动生成的应力-应变特性数据库。
Sci Data. 2024 Nov 23;11(1):1273. doi: 10.1038/s41597-024-03979-6.
4
Application of machine reading comprehension techniques for named entity recognition in materials science.机器阅读理解技术在材料科学中用于命名实体识别的应用
J Cheminform. 2024 Jul 2;16(1):76. doi: 10.1186/s13321-024-00874-5.
5
A document-level information extraction pipeline for layered cathode materials for sodium-ion batteries.钠离子电池层状阴极材料的文档级信息抽取管道。
Sci Data. 2024 Apr 11;11(1):372. doi: 10.1038/s41597-024-03196-1.
6
A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor.利用 ChemDataExtractor 从科学文献中自动生成热激活延迟荧光分子数据库。
Sci Data. 2024 Jan 17;11(1):80. doi: 10.1038/s41597-023-02897-3.
7
Snowball 2.0: Generic Material Data Parser for ChemDataExtractor.雪球 2.0:ChemDataExtractor 的通用物质数据解析器。
J Chem Inf Model. 2023 Nov 27;63(22):7045-7055. doi: 10.1021/acs.jcim.3c01281. Epub 2023 Nov 7.
8
Automated Construction of a Photocatalysis Dataset for Water-Splitting Applications.自动化构建用于水分解应用的光催化数据集。
Sci Data. 2023 Sep 22;10(1):651. doi: 10.1038/s41597-023-02511-6.
9
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
10
OpticalBERT and OpticalTable-SQA: Text- and Table-Based Language Models for the Optical-Materials Domain.光学 BERT 和光学 Table-SQA:面向光学材料领域的基于文本和表格的语言模型。
J Chem Inf Model. 2023 Apr 10;63(7):1961-1981. doi: 10.1021/acs.jcim.2c01259. Epub 2023 Mar 20.
通过半监督关系抽取技术生成居里温度和奈尔温度的自动材料数据库。
Sci Data. 2018 Jun 19;5:180111. doi: 10.1038/sdata.2018.111.
4
Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.将手工操作搁置一旁:用于化学命名实体识别的高效深度卷积神经网络-循环神经网络架构,无需手工规则。
J Cheminform. 2018 May 23;10(1):28. doi: 10.1186/s13321-018-0280-0.
5
An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.基于注意力机制的 BiLSTM-CRF 方法在文档级化学命名实体识别中的应用。
Bioinformatics. 2018 Apr 15;34(8):1381-1388. doi: 10.1093/bioinformatics/btx761.
6
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor:一个用于从科学文献中自动提取化学信息的工具包。
J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.
7
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
8
ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition.ChemTok:一种用于化学命名实体识别的新型基于规则的分词器。
Biomed Res Int. 2016;2016:4248026. doi: 10.1155/2016/4248026. Epub 2016 Jan 28.
9
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
10
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.