Suppr超能文献

在 PubMed 规模上进行复杂事件抽取。

Complex event extraction at PubMed scale.

机构信息

Department of Information Technology, University of Turku, Turku, Finland.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i382-90. doi: 10.1093/bioinformatics/btq180.

Abstract

MOTIVATION

There has recently been a notable shift in biomedical information extraction (IE) from relation models toward the more expressive event model, facilitated by the maturation of basic tools for biomedical text analysis and the availability of manually annotated resources. The event model allows detailed representation of complex natural language statements and can support a number of advanced text mining applications ranging from semantic search to pathway extraction. A recent collaborative evaluation demonstrated the potential of event extraction systems, yet there have so far been no studies of the generalization ability of the systems nor the feasibility of large-scale extraction.

RESULTS

This study considers event-based IE at PubMed scale. We introduce a system combining publicly available, state-of-the-art methods for domain parsing, named entity recognition and event extraction, and test the system on a representative 1% sample of all PubMed citations. We present the first evaluation of the generalization performance of event extraction systems to this scale and show that despite its computational complexity, event extraction from the entire PubMed is feasible. We further illustrate the value of the extraction approach through a number of analyses of the extracted information.

AVAILABILITY

The event detection system and extracted data are open source licensed and available at http://bionlp.utu.fi/.

摘要

动机

近年来,生物医学信息提取(IE)已经从关系模型转向更具表现力的事件模型,这得益于生物医学文本分析基本工具的成熟和手动注释资源的可用性。事件模型允许对复杂的自然语言语句进行详细表示,并可以支持许多高级文本挖掘应用程序,从语义搜索到途径提取。最近的一次协作评估展示了事件提取系统的潜力,但到目前为止,还没有关于系统的泛化能力和大规模提取的可行性的研究。

结果

本研究考虑在 PubMed 规模上进行基于事件的 IE。我们引入了一个系统,该系统结合了用于领域解析、命名实体识别和事件提取的公开可用的最先进方法,并在所有 PubMed 引文的代表性 1%样本上对该系统进行了测试。我们首次评估了事件提取系统在这种规模上的泛化性能,并表明尽管计算复杂度很高,但从整个 PubMed 中提取事件是可行的。我们通过对提取信息的一些分析进一步说明了提取方法的价值。

可用性

事件检测系统和提取的数据是开源许可的,并可在 http://bionlp.utu.fi/ 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37dd/2881365/58a2c570842c/btq180f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验