Björne Jari, Salakoski Tapio
BMC Bioinformatics. 2015;16 Suppl 16(Suppl 16):S4. doi: 10.1186/1471-2105-16-S16-S4. Epub 2015 Oct 30.
The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks.
The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets.
The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
图尔库事件提取系统(TEES)是一个为从科学文献中提取事件(复杂的生物医学关系)而开发的文本挖掘程序。基于一种图生成方法,该系统通过依存句法分析构建的丰富特征集来检测事件。TEES系统在其领域的多个共享任务中取得了创纪录的性能,并且继续被用于各种生物医学文本挖掘任务。
TEES系统迅速适应了BioNLP'13共享任务,以便为派生系统提供一个公共基线。开发了一种自动方法来学习事件类型的潜在注释规则,从而能够立即适应各种子任务,并在八项任务中的四项中获得第一名。本文进一步增强了用于自动学习注释规则的系统,使其无需手动适应任何BioNLP'13任务。此外,将scikit-learn机器学习库集成到系统中,除了默认的支持向量机外,还带来了多种可与TEES一起使用的机器学习方法。还使用了一种scikit-learn集成方法来分析TEES特征集中特征的重要性。
TEES系统是为BioNLP'09共享任务而引入的,此后在其他几个共享任务中也表现出良好的性能。通过将当前的TEES 2.2系统应用于这些过去共享任务中的多个语料库,对生物医学事件提取这一不断发展的领域中最有前途的方法和可能存在的陷阱进行了全面分析。