Department of Information Technology, University of Turku, Turku Centre for Computer Science (TUCS), Joukahaisenkatu 3-5, 20520 Turku, Finland.
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.
We present a system for extracting biomedical events (detailed descriptions of biomolecular interactions) from research articles, developed for the BioNLP'11 Shared Task. Our goal is to develop a system easily adaptable to different event schemes, following the theme of the BioNLP'11 Shared Task: generalization, the extension of event extraction to varied biomedical domains. Our system extends our BioNLP'09 Shared Task winning Turku Event Extraction System, which uses support vector machines to first detect event-defining words, followed by detection of their relationships.
Our current system successfully predicts events for every domain case introduced in the BioNLP'11 Shared Task, being the only system to participate in all eight tasks and all of their subtasks, with best performance in four tasks. Following the Shared Task, we improve the system on the Infectious Diseases task from 42.57% to 53.87% F-score, bringing performance into line with the similar GENIA Event Extraction and Epigenetics and Post-translational Modifications tasks. We evaluate the machine learning performance of the system by calculating learning curves for all tasks, detecting areas where additional annotated data could be used to improve performance. Finally, we evaluate the use of system output on external articles as additional training data in a form of self-training.
We show that the updated Turku Event Extraction System can easily be adapted to all presently available event extraction targets, with competitive performance in most tasks. The scope of the performance gains between the 2009 and 2011 BioNLP Shared Tasks indicates event extraction is still a new field requiring more work. We provide several analyses of event extraction methods and performance, highlighting potential future directions for continued development.
我们提出了一个从研究文章中提取生物医学事件(生物分子相互作用的详细描述)的系统,该系统是为 BioNLP'11 共享任务开发的。我们的目标是开发一个易于适应不同事件方案的系统,遵循 BioNLP'11 共享任务的主题:即泛化,将事件提取扩展到不同的生物医学领域。我们的系统扩展了我们在 BioNLP'09 共享任务中获得的图尔库事件提取系统,该系统使用支持向量机首先检测事件定义词,然后检测它们之间的关系。
我们当前的系统成功地预测了 BioNLP'11 共享任务中引入的每个领域案例的事件,是唯一参与所有八个任务及其所有子任务的系统,在四个任务中表现最佳。根据共享任务,我们将传染病任务的系统性能从 42.57%提高到 53.87%的 F 分数,使性能与类似的 GENIA 事件提取和表观遗传学及翻译后修饰任务相匹配。我们通过为所有任务计算学习曲线来评估系统的机器学习性能,检测可以使用额外注释数据来提高性能的区域。最后,我们以自训练的形式评估将系统输出作为额外训练数据应用于外部文章的效果。
我们表明,更新后的图尔库事件提取系统可以轻松适应目前所有可用的事件提取目标,在大多数任务中具有竞争力的表现。2009 年和 2011 年 BioNLP 共享任务之间的性能提升范围表明,事件提取仍然是一个需要进一步研究的新领域。我们对事件提取方法和性能进行了多次分析,突出了持续发展的潜在未来方向。