Miwa Makoto, Saetre Rune, Kim Jin-Dong, Tsujii Jun'ichi
Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan.
J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.
Biomedical Natural Language Processing (BioNLP) attempts to capture biomedical phenomena from texts by extracting relations between biomedical entities (i.e. proteins and genes). Traditionally, only binary relations have been extracted from large numbers of published papers. Recently, more complex relations (biomolecular events) have also been extracted. Such events may include several entities or other relations. To evaluate the performance of the text mining systems, several shared task challenges have been arranged for the BioNLP community. With a common and consistent task setting, the BioNLP'09 shared task evaluated complex biomolecular events such as binding and regulation.Finding these events automatically is important in order to improve biomedical event extraction systems. In the present paper, we propose an automatic event extraction system, which contains a model for complex events, by solving a classification problem with rich features. The main contributions of the present paper are: (1) the proposal of an effective bio-event detection method using machine learning, (2) provision of a high-performance event extraction system, and (3) the execution of a quantitative error analysis. The proposed complex (binding and regulation) event detector outperforms the best system from the BioNLP'09 shared task challenge.
生物医学自然语言处理(BioNLP)试图通过提取生物医学实体(即蛋白质和基因)之间的关系,从文本中捕捉生物医学现象。传统上,仅从大量已发表的论文中提取二元关系。最近,也开始提取更复杂的关系(生物分子事件)。此类事件可能包括多个实体或其他关系。为了评估文本挖掘系统的性能,为BioNLP社区安排了几次共享任务挑战。在一个共同且一致的任务设置下,BioNLP'09共享任务评估了诸如结合和调控等复杂的生物分子事件。自动发现这些事件对于改进生物医学事件提取系统很重要。在本文中,我们提出了一个自动事件提取系统,该系统通过解决具有丰富特征的分类问题,包含一个针对复杂事件的模型。本文的主要贡献在于:(1)提出了一种使用机器学习的有效生物事件检测方法,(2)提供了一个高性能的事件提取系统,以及(3)进行了定量误差分析。所提出的复杂(结合和调控)事件检测器优于BioNLP'09共享任务挑战中的最佳系统。