Suppr超能文献

在PubMed规模上对基因/蛋白质关联进行的分析。

An analysis of gene/protein associations at PubMed scale.

作者信息

Pyysalo Sampo, Ohta Tomoko, Tsujii Jun'ichi

机构信息

Department of Computer Science, University of Tokyo, Tokyo, Japan.

出版信息

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S5. doi: 10.1186/2041-1480-2-S5-S5.

Abstract

BACKGROUND

Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available.

RESULTS

In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology.

CONCLUSIONS

We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage.

摘要

背景

遵循GENIA事件语料库和生物自然语言处理共享任务模型进行事件提取,一直是生物医学信息提取领域近期工作的重点。这项工作包括将事件提取方法应用于整个PubMed文献数据库,远远超出了有用于提取方法开发的注释资源的狭义生物医学子领域。

结果

在本研究中,我们的目的是估计现有事件提取资源能够提供的PubMed中所有基因/蛋白质关联陈述的覆盖率。我们的分析基于最近发布的一个自动注释了基因/蛋白质实体并涵盖整个PubMed的句法分析语料库,并使用命名实体共现、最短依存路径和一个未词法化的分类器来识别可能的基因/蛋白质关联陈述。然后,参照GENIA本体对一组高频/高可能性关联陈述进行人工分析。

结论

我们首次估计了现有事件提取资源对基因/蛋白质关联的总体覆盖率。我们的结果表明,对于事件类型的关联,这一覆盖率可能超过90%。我们还识别出了这些资源未涉及的几个具有生物学意义的基因和蛋白质关联,为进一步扩大提取覆盖率指明了方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96f2/3239305/a034c25c1455/2041-1480-2-S5-S5-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验