Suppr超能文献

基于支持向量机的两阶段模型的生物医学命名实体识别

Biomedical named entity recognition using two-phase model based on SVMs.

作者信息

Lee Ki-Joong, Hwang Young-Sook, Kim Seonho, Rim Hae-Chang

机构信息

Natural Language Processing Laboratory, Department of Computer Science and Engineering, Korea University, 1, 5-ka, Anam-dong, Seoul 136-701, Republic of Korea.

出版信息

J Biomed Inform. 2004 Dec;37(6):436-47. doi: 10.1016/j.jbi.2004.08.012.

Abstract

Named entity (NE) recognition has become one of the most fundamental tasks in biomedical knowledge acquisition. In this paper, we present a two-phase named entity recognizer based on SVMs, which consists of a boundary identification phase and a semantic classification phase of named entities. When adapting SVMs to named entity recognition, the multi-class problem and the unbalanced class distribution problem become very serious in terms of training cost and performance. We try to solve these problems by separating the NE recognition task into two subtasks, where we use appropriate SVM classifiers and relevant features for each subtask. In addition, by employing a hierarchical classification method based on ontology, we effectively solve the multi-class problem concerning semantic classification. The experimental results on the GENIA corpus show that the proposed method is effective not only in reducing computational cost but also in improving performance. The F-score (beta=1) for the boundary identification is 74.8 and the F-score for the semantic classification is 66.7.

摘要

命名实体(NE)识别已成为生物医学知识获取中最基本的任务之一。在本文中,我们提出了一种基于支持向量机(SVM)的两阶段命名实体识别器,它由命名实体的边界识别阶段和语义分类阶段组成。当将支持向量机应用于命名实体识别时,就训练成本和性能而言,多类问题和类分布不均衡问题变得非常严重。我们试图通过将命名实体识别任务分为两个子任务来解决这些问题,在每个子任务中我们使用适当的支持向量机分类器和相关特征。此外,通过采用基于本体的层次分类方法,我们有效地解决了语义分类中的多类问题。在GENIA语料库上的实验结果表明,所提出的方法不仅在降低计算成本方面有效,而且在提高性能方面也有效。边界识别的F值(β = 1)为74.8,语义分类的F值为66.7。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验