IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1549-1559. doi: 10.1109/TCBB.2017.2710048.
Biological Event Extraction is an important task towards the goal of extracting biomedical knowledge from the scientific publications by capturing biomedical entities and their complex relations from the texts. As a crucial step in event extraction, event trigger identification, assigning words with suitable trigger category, has recently attracted substantial attention. As triggers are scattered in large corpus, traditional linguistic parsers are hard to generate syntactic features from them. Thereby, trigger sparsity problem restricts the model's learning process and becomes one of the main hinder in trigger identification. In this paper, we employ Noise Contrastive Estimation with Multi-Layer Perceptron model for solving triggers' sparsity problem. Meanwhile, in the light of recent advance in word distributed representation, word-embedding feature generated by language model is utilized for semantic and syntactic information extraction. Finally, experimental study on commonly used MLEE dataset against baseline methods has demonstrated its promising result.
生物事件抽取是从科学文献中提取生物医学知识的重要任务,通过从文本中捕获生物医学实体及其复杂关系。作为事件抽取的关键步骤,事件触发词识别,即将具有合适触发词类别的单词分配给文本,最近引起了广泛关注。由于触发词分散在大型语料库中,传统的语言解析器很难从中生成句法特征。因此,触发词稀疏性问题限制了模型的学习过程,成为触发词识别的主要障碍之一。在本文中,我们采用多层感知机的噪声对比估计来解决触发词的稀疏性问题。同时,鉴于最近在单词分布式表示方面的进展,我们利用语言模型生成的单词嵌入特征来提取语义和句法信息。最后,在常用的 MLEE 数据集上进行的实验研究表明,该方法相对于基线方法具有很好的效果。