Suppr超能文献

NLM-Chem,一个用于 PubMed 全文文献中化学实体识别的新资源。

NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

机构信息

National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

出版信息

Sci Data. 2021 Mar 25;8(1):91. doi: 10.1038/s41597-021-00875-1.

Abstract

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.

摘要

自动识别科学出版物中的化学和药物名称,可以通过改进检索和与相关概念的链接,为各种生物医学领域的这一类实体提供更好的信息访问。虽然当前用于标记化学实体的方法是针对文章标题和摘要开发的,但它们在全文文本中的性能要低得多。然而,全文通常包含更详细的化学信息,例如化合物的性质、它们的生物效应以及与疾病、基因和其他化学物质的相互作用。因此,我们提供了 NLM-Chem 语料库,这是一个支持自动化化学实体标记器的开发和评估的全文资源。NLM-Chem 语料库由 150 篇全文文章组成,由十位 NLM 索引专家进行双重注释,包含约 5000 个独特的化学名称注释,并映射到约 2000 个 MeSH 标识符。我们还描述了一个经过大幅改进的化学实体标记器,它可以通过 PubTator 基于网络的界面和 API 自动对 PubMed 和 PMC 进行注释,这些注释都是免费提供的。NLM-Chem 语料库是免费提供的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0afb/7994842/af2a943f7ed1/41597_2021_875_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验