• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物医学文献中命名实体识别方法的比较。

Comparison of named entity recognition methodologies in biomedical documents.

机构信息

School of Software, Hallym University, Chuncheon, South Korea.

Bio-IT Research Center, Hallym University, Chuncheon, South Korea.

出版信息

Biomed Eng Online. 2018 Nov 6;17(Suppl 2):158. doi: 10.1186/s12938-018-0573-6.

DOI:10.1186/s12938-018-0573-6
PMID:30396340
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6219049/
Abstract

BACKGROUND

Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

RESULTS

Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

CONCLUSIONS

By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

摘要

背景

生物医学命名实体识别(Bio-NER)是处理生物医学文本术语的基本任务,例如 RNA、蛋白质、细胞类型、细胞系和 DNA。Bio-NER 是从文本中发现生物医学知识的最基本和核心任务之一。这里描述的系统是使用 BioNLP/NLPBA 2004 共享任务开发的。实验是在任务组织者提供的培训和评估集上进行的。

结果

我们的结果表明,与基线的 F1 分数为 70.09%相比,RNN Jordan 和 Elman 类型算法的 F1 分数分别约为 60.53%和 58.80%。当我们使用 CRF 作为机器学习算法时,CCA、GloVe 和 Word2Vec 的 F1 分数分别为 72.73%、72.74%和 72.82%。

结论

通过使用无监督学习构建的单词嵌入,可以节省构建学习数据所需的时间和成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/ebd00d1dc596/12938_2018_573_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/e4bfd1dca77a/12938_2018_573_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/5e2777bec4f0/12938_2018_573_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/7625cf8b65ce/12938_2018_573_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/3a066c76d518/12938_2018_573_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/07e23dfac400/12938_2018_573_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/3c848cbefda3/12938_2018_573_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/fd1b450e47a0/12938_2018_573_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/ebd00d1dc596/12938_2018_573_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/e4bfd1dca77a/12938_2018_573_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/5e2777bec4f0/12938_2018_573_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/7625cf8b65ce/12938_2018_573_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/3a066c76d518/12938_2018_573_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/07e23dfac400/12938_2018_573_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/3c848cbefda3/12938_2018_573_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/fd1b450e47a0/12938_2018_573_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/ebd00d1dc596/12938_2018_573_Fig8_HTML.jpg

相似文献

1
Comparison of named entity recognition methodologies in biomedical documents.生物医学文献中命名实体识别方法的比较。
Biomed Eng Online. 2018 Nov 6;17(Suppl 2):158. doi: 10.1186/s12938-018-0573-6.
2
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
3
Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.基于本体的推特消息中医疗命名实体识别的递归神经网络方法。
Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.
4
Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络
BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.
5
Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.结合命名实体识别和未知词处理的本体事件抽取的主动学习
J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.
6
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.
7
SBLC: a hybrid model for disease named entity recognition based on semantic bidirectional LSTMs and conditional random fields.SBLC:一种基于语义双向 LSTM 和条件随机场的疾病命名实体识别混合模型。
BMC Med Inform Decis Mak. 2018 Dec 7;18(Suppl 5):114. doi: 10.1186/s12911-018-0690-y.
8
Character level and word level embedding with bidirectional LSTM - Dynamic recurrent neural network for biomedical named entity recognition from literature.基于字符和词的双向 LSTM 嵌入 - 用于从文献中识别生物医学命名实体的动态递归神经网络。
J Biomed Inform. 2020 Dec;112:103609. doi: 10.1016/j.jbi.2020.103609. Epub 2020 Oct 26.
9
Clinical Named Entity Recognition Using Deep Learning Models.使用深度学习模型的临床命名实体识别
AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. eCollection 2017.
10
Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.结合事实医学知识与分布式词表示以改进临床命名实体识别。
AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.

引用本文的文献

1
A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature.一种结合手动标注和深度学习自然语言处理的遗传性疾病相关生物医学文献中精确实体抽取方法的研究。
Interdiscip Sci. 2024 Jun;16(2):333-344. doi: 10.1007/s12539-024-00605-2. Epub 2024 Feb 10.
2
A Systematic Approach to Configuring MetaMap for Optimal Performance.系统方法配置 MetaMap 以实现最佳性能。
Methods Inf Med. 2022 Dec;61(S 02):e51-e63. doi: 10.1055/a-1862-0421. Epub 2022 May 25.
3
Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem.

本文引用的文献

1
Feature selection techniques for maximum entropy based biomedical named entity recognition.基于最大熵的生物医学命名实体识别的特征选择技术。
J Biomed Inform. 2009 Oct;42(5):905-11. doi: 10.1016/j.jbi.2008.12.012. Epub 2009 Jan 23.
2
BANNER: an executable survey of advances in biomedical named entity recognition.横幅:生物医学命名实体识别进展的可执行调查。
Pac Symp Biocomput. 2008:652-63.
3
NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.NERBio:利用选定的词连接、术语规范化和全局模式来改进生物医学命名实体识别。
在PubChem中发现并总结化学物质、基因、蛋白质和疾病之间的关系。
Front Res Metr Anal. 2021 Jul 12;6:689059. doi: 10.3389/frma.2021.689059. eCollection 2021.
4
Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review.利用PubAnnotation生态系统对基因组学与信息学进行敏捷文本挖掘:教程综述
Genomics Inform. 2020 Jun;18(2):e13. doi: 10.5808/GI.2020.18.2.e13. Epub 2020 Jun 16.
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-7-S5-S11.
4
Quantitative assessment of dictionary-based protein named entity tagging.基于词典的蛋白质命名实体标注的定量评估
J Am Med Inform Assoc. 2006 Sep-Oct;13(5):497-507. doi: 10.1197/jamia.M2085. Epub 2006 Jun 23.
5
Term identification in the biomedical literature.生物医学文献中的术语识别。
J Biomed Inform. 2004 Dec;37(6):512-26. doi: 10.1016/j.jbi.2004.08.004.
6
Constructing biological knowledge bases by extracting information from text sources.通过从文本来源中提取信息来构建生物知识库。
Proc Int Conf Intell Syst Mol Biol. 1999:77-86.
7
Automatic extraction of biological information from scientific text: protein-protein interactions.从科学文本中自动提取生物信息:蛋白质-蛋白质相互作用
Proc Int Conf Intell Syst Mol Biol. 1999:60-7.
8
An ontology for bioinformatics applications.一种用于生物信息学应用的本体。
Bioinformatics. 1999 Jun;15(6):510-20. doi: 10.1093/bioinformatics/15.6.510.
9
Toward information extraction: identifying protein names from biological papers.迈向信息提取:从生物学论文中识别蛋白质名称。
Pac Symp Biocomput. 1998:707-18.