Suppr超能文献

评估生物医学命名实体识别任务中的词表示特征。

Evaluating word representation features in biomedical named entity recognition tasks.

机构信息

Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China ; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

Department of Medical Informatics, Second Military Medical University, Shanghai 200433, China.

出版信息

Biomed Res Int. 2014;2014:240403. doi: 10.1155/2014/240403. Epub 2014 Mar 6.

Abstract

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements in F-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.

摘要

生物医学命名实体识别(BNER)是自然语言处理在生物医学领域中的重要步骤,它可以提取出基因和蛋白质等重要实体。基于机器学习的各种方法已被应用于 BNER 任务,并取得了良好的性能。在本文中,我们系统地研究了 BNER 中的三种不同类型的词表示(WR)特征,包括基于聚类的表示、分布表示和词嵌入。我们从这三种 WR 特征中各选择一种算法,并将它们应用于 JNLPBA 和 BioCreAtIvE II BNER 任务。我们的结果表明,所有三种 WR 算法都有助于基于机器学习的 BNER 系统。此外,结合这些不同类型的 WR 特征进一步提高了 BNER 的性能,表明它们是互补的。通过结合所有三种 WR 特征,与使用基线特征的系统相比,在 BioCreAtIvE II GM 和 JNLPBA 语料库上 F 度量的提高分别为 3.75%和 1.39%。据我们所知,这是第一项系统评估三种不同类型的 WR 特征对 BNER 任务影响的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7587/3963372/eb02cb9f3ea1/BMRI2014-240403.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验