Leser Ulf, Hakenberg Jörg
Department for Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, Germany.
Brief Bioinform. 2005 Dec;6(4):357-69. doi: 10.1093/bib/6.4.357.
The recognition of biomedical concepts in natural text (named entity recognition, NER) is a key technology for automatic or semi-automatic analysis of textual resources. Precise NER tools are a prerequisite for many applications working on text, such as information retrieval, information extraction or document classification. Over the past years, the problem has achieved considerable attention in the bioinformatics community and experience has shown that NER in the life sciences is a rather difficult problem. Several systems and algorithms have been devised and implemented. In this paper, the problems and resources in NER research are described, the principal algorithms underlying most systems sketched, and the current state-of-the-art in the field surveyed.
在自然文本中识别生物医学概念(命名实体识别,NER)是对文本资源进行自动或半自动分析的关键技术。精确的NER工具是许多文本处理应用(如信息检索、信息提取或文档分类)的先决条件。在过去几年中,这个问题在生物信息学领域受到了相当大的关注,并且经验表明生命科学中的NER是一个相当困难的问题。已经设计并实现了几种系统和算法。本文描述了NER研究中的问题和资源,概述了大多数系统所基于的主要算法,并对该领域的当前技术水平进行了调查。