Leaman Robert, Gonzalez Graciela
Department of Computer Science and Engineering, Arizona State University, USA.
Pac Symp Biocomput. 2008:652-63.
There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances described in the literature. In this paper we present BANNER, an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machine-learning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.
关于生物医学命名实体识别这一最基本的文本提取问题,已有越来越多的研究,世界各地的不同研究团队也因此取得了显著进展。这就催生了对一个可免费获取的开源系统的需求,该系统能够实现文献中所描述的进展。在本文中,我们介绍了BANNER,这是一个关于生物医学命名实体识别进展的开源可执行综述,旨在作为该领域的一个基准。BANNER用Java实现,是一个基于条件随机场的机器学习系统,并且全面涵盖了文献中最近描述的最佳技术。它旨在通过不采用脆弱的语义特征或基于规则的处理步骤来最大化领域独立性,并且比现有的基线系统性能显著更好。因此,它对开发者而言是一个可扩展的命名实体识别实现,对研究人员而言是比较创新技术的标准,对需要在大量文本中查找新实体的生物学家而言也很有用。