Ananiadou Sophia, Thompson Paul, Nawaz Raheel, McNaught John, Kell Douglas B
Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of 'events', i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.
对基因组功能的评估需要在源自基因组的实体与生化反应之间建立映射关系,而生物医学文献是有关生物成分之间反应的丰富信息来源。然而,文献数量的日益快速增长既给研究人员带来了挑战,也为他们提供了机遇,使其能够及时、高效地分离出感兴趣反应的信息。作为回应,生物学领域最近的文本挖掘研究主要集中在从文献中识别和提取“事件”,即生化实体之间关系的分类、结构化表示。功能基因组学分析必然包含如此定义的事件。自动事件提取系统有助于开发复杂的语义搜索应用程序,使研究人员能够对提取的事件进行结构化查询,从而指定要检索的精确反应类型。本文概述了事件提取的近期研究。我们涵盖了用于训练系统的带注释语料库、达到当前最佳性能的系统以及对提高近期系统的质量、覆盖范围和可扩展性起到重要作用的社区共享任务的细节。最后,介绍了事件提取的几个具体应用以及新兴的研究方向。