Suppr超能文献

生物医学文献中命名实体识别方法的比较。

Comparison of named entity recognition methodologies in biomedical documents.

机构信息

School of Software, Hallym University, Chuncheon, South Korea.

Bio-IT Research Center, Hallym University, Chuncheon, South Korea.

出版信息

Biomed Eng Online. 2018 Nov 6;17(Suppl 2):158. doi: 10.1186/s12938-018-0573-6.

Abstract

BACKGROUND

Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

RESULTS

Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

CONCLUSIONS

By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.

摘要

背景

生物医学命名实体识别(Bio-NER)是处理生物医学文本术语的基本任务,例如 RNA、蛋白质、细胞类型、细胞系和 DNA。Bio-NER 是从文本中发现生物医学知识的最基本和核心任务之一。这里描述的系统是使用 BioNLP/NLPBA 2004 共享任务开发的。实验是在任务组织者提供的培训和评估集上进行的。

结果

我们的结果表明,与基线的 F1 分数为 70.09%相比,RNN Jordan 和 Elman 类型算法的 F1 分数分别约为 60.53%和 58.80%。当我们使用 CRF 作为机器学习算法时,CCA、GloVe 和 Word2Vec 的 F1 分数分别为 72.73%、72.74%和 72.82%。

结论

通过使用无监督学习构建的单词嵌入,可以节省构建学习数据所需的时间和成本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d010/6219049/e4bfd1dca77a/12938_2018_573_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验