基于条件随机场、模糊匹配和字符级建模的宽领域生物医学命名实体识别和标准化。

Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling.

机构信息

Turku Centre for Computer Science, Turku, Finland.

Department of Future Technologies, University of Turku, Turku, Finland.

出版信息

Database (Oxford). 2018 Jan 1;2018:1-10. doi: 10.1093/database/bay096.

DOI:10.1093/database/bay096

PMID:30239666

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6146133/

Abstract

We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.Database URL: https://github.com/TurkuNLP/BioCreativeVI_BioID_assignment.

摘要

我们提出了一个从文献中自动识别多种生物医学实体的系统。这项工作基于我们在 BioCreative VI：交互式生物识别分配共享任务中的先前努力，我们的系统在命名实体识别方面取得了最高的最新结果，展示了最先进的性能。在本文中，我们描述了在共享任务中使用的原始基于条件随机场的系统，以及自那以后进行的实验，包括更好的超参数调整和字符级建模，这导致了进一步的性能提升。为了将提及内容规范化为唯一标识符，我们使用模糊字符 n-gram 匹配。规范化方法也得到了改进，采用了更好的缩写解析方法和更严格的指导方针，从而使各种实体类型的结果得到了极大的改善。用于命名实体识别和规范化的所有工具和模型都根据开放许可证在公开提供。数据库 URL：https://github.com/TurkuNLP/BioCreativeVI_BioID_assignment。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e9c/6146133/8fc912fc8a62/bay096f1.jpg

相似文献

Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling.基于条件随机场、模糊匹配和字符级建模的宽领域生物医学命名实体识别和标准化。

Database (Oxford). 2018 Jan 1;2018:1-10. doi: 10.1093/database/bay096.

Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases.堆叠集成与模糊匹配相结合用于疾病的生物医学命名实体识别

J Biomed Inform. 2016 Dec;64:1-9. doi: 10.1016/j.jbi.2016.09.009. Epub 2016 Sep 12.

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter：使用序列标注工具集合进行化学命名实体识别。

J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.

Curatable Named-Entity Recognition Using Semantic Relations.利用语义关系进行可治愈命名实体识别

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):785-92. doi: 10.1109/TCBB.2014.2366770.

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.使用增强型词典和查询扩展改进疾病规范化的词典查找方法。

Database (Oxford). 2016 Aug 7;2016. doi: 10.1093/database/baw112. Print 2016.

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.FamPlex：生物医学文本挖掘中人类蛋白质家族和复合物的实体识别和关系解析资源。

BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.基于知识增强的生物医学命名实体识别与规范：在蛋白质和基因上的应用。

BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.

Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络

BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.

Two-phase biomedical named entity recognition using CRFs.使用条件随机场的两阶段生物医学命名实体识别

Comput Biol Chem. 2009 Aug;33(4):334-8. doi: 10.1016/j.compbiolchem.2009.07.004. Epub 2009 Aug 4.

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.D3NER：基于条件随机场-双向长短期记忆网络的生物医学命名实体识别，通过各种语言信息的微调嵌入得到改进。

Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.

引用本文的文献

Construction, evaluation, and application of an electronic medical record corpus for cerebral palsy rehabilitation.用于脑瘫康复的电子病历语料库的构建、评估及应用

Digit Health. 2024 Sep 27;10:20552076241286260. doi: 10.1177/20552076241286260. eCollection 2024 Jan-Dec.

Chemical entity normalization for successful translational development of Alzheimer's disease and dementia therapeutics.化学实体标准化对阿尔茨海默病和痴呆症治疗药物的成功转化开发至关重要。

J Biomed Semantics. 2024 Jul 31;15(1):13. doi: 10.1186/s13326-024-00314-1.

Gilda: biomedical entity text normalization with machine-learned disambiguation as a service.吉尔达：作为一种服务的、带有机器学习消歧功能的生物医学实体文本规范化。

Bioinform Adv. 2022 May 11;2(1):vbac034. doi: 10.1093/bioadv/vbac034. eCollection 2022.

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.从科学出版物文本中自动提取信息：对HIV治疗策略的见解

Front Genet. 2020 Dec 22;11:618862. doi: 10.3389/fgene.2020.618862. eCollection 2020.

BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.

本文引用的文献

SPRENO: a BioC module for identifying organism terms in figure captions.SPRENO：一个用于在图注中识别生物学术语的 BioC 模块。

Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay048.

CNN-based ranking for biomedical entity normalization.基于卷积神经网络的生物医学实体标准化排序

BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):385. doi: 10.1186/s12859-017-1805-7.

Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne：使用半马尔可夫模型进行联合命名实体识别与归一化

Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.

Cell line name recognition in support of the identification of synthetic lethality in cancer from text.支持从文本中识别癌症合成致死性的细胞系名称识别

Bioinformatics. 2016 Jan 15;32(2):276-82. doi: 10.1093/bioinformatics/btv570. Epub 2015 Oct 1.

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.GNormPlus：一种用于标记基因、基因家族和蛋白质结构域的综合方法。

Biomed Res Int. 2015;2015:918710. doi: 10.1155/2015/918710. Epub 2015 Aug 25.

pGenN, a gene normalization tool for plant genes and proteins in scientific literature.pGenN，一种用于科学文献中植物基因和蛋白质的基因标准化工具。

PLoS One. 2015 Aug 10;10(8):e0135305. doi: 10.1371/journal.pone.0135305. eCollection 2015.

Gene: a gene-centered information resource at NCBI.基因：美国国立医学图书馆国家生物技术信息中心的一个以基因为中心的信息资源库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D36-42. doi: 10.1093/nar/gku1055. Epub 2014 Oct 29.

UniProt: a hub for protein information.通用蛋白质数据库（UniProt）：蛋白质信息中心。

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

Anatomical entity mention recognition at literature scale.文献级别的解剖实体提及识别。

Bioinformatics. 2014 Mar 15;30(6):868-75. doi: 10.1093/bioinformatics/btt580. Epub 2013 Oct 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于条件随机场、模糊匹配和字符级建模的宽领域生物医学命名实体识别和标准化。

Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献