Suppr超能文献

生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。

BioInfer: a corpus for information extraction in the biomedical domain.

作者信息

Pyysalo Sampo, Ginter Filip, Heimonen Juho, Björne Jari, Boberg Jorma, Järvinen Jouni, Salakoski Tapio

机构信息

Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14a, 20520 Turku, Finland.

出版信息

BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.

Abstract

BACKGROUND

Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora.

RESULTS

We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation.

CONCLUSION

We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer.

摘要

背景

最近,人们对信息提取方法在生物医学领域的应用,尤其是从科学出版物中提取基因、蛋白质和RNA的关系,产生了浓厚的兴趣。此类方法的开发和评估需要带注释的领域语料库。

结果

我们展示了BioInfer(生物信息提取资源),这是一个新的公共资源,提供了一个带注释的生物医学英语语料库。我们描述了一种注释方案,该方案可捕获命名实体及其关系以及句子句法的依存关系分析。我们还展示了定义语料库中注释的实体和关系类型的本体。目前,该语料库包含1100个来自生物医学研究文章摘要的句子,这些句子针对关系、命名实体以及句法依存关系进行了注释。语料库附带了支持软件。该语料库在领域内独一无二,它为一组句子组合了这些注释类型,并且在关系注释的详细程度方面也很独特。

结论

我们引入了一个针对蛋白质、基因和RNA关系的语料库,该语料库可作为信息提取系统及其组件(如解析器和领域分析器)开发的资源。该语料库将得到维护并进一步开发,当前版本可在http://www.it.utu.fi/BioInfer获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/9b9194a8efdf/1471-2105-8-50-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验