Suppr超能文献

使用自动自然语言处理技术对Wnt信号通路进行整理:结合统计方法与部分及完全句法分析进行知识提取。

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

作者信息

Santos Carlos, Eggle Daniela, States David J

机构信息

Bioinformatics Program, The University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.

Abstract

MOTIVATION

Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map.

RESULTS

A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases.

AVAILABILITY

The pipeline software components are freely available on request to the authors.

CONTACT

dstates@umich.edu

SUPPLEMENTARY INFORMATION

http://stateslab.bioinformatics.med.umich.edu/software.html.

摘要

动机

Wnt信号传导是一个非常活跃的研究领域,每天都有大量相关出版物问世。构建和维护描述信号转导网络的数据库是一项耗时且要求很高的任务,需要仔细的文献分析和广泛的特定领域知识。例如,截至2003年底,已鉴定出50多种参与Wnt信号转导的因子。在这项工作中,我们描述了一种自然语言处理(NLP)系统,该系统能够识别自由文本中对生物相互作用网络的引用,并自动组装蛋白质关联和相互作用图谱。

结果

通过人工扫描Wnt基因网站(http://www.stanford.edu/~rnusse/wntwindow.html)得出了一组“金标准”名称和断言,其中包括53个参与Wnt信号传导的相互作用。该系统用于分析与Wnt信号传导相关的同行评审文章语料库,包括3369篇来自PubMed的文章和1230篇全文论文。通过对Wnt文献中与一般信号转导文献相比过度出现的名词短语进行卡方分析,确定了关键Wnt通路相关蛋白质和生物实体的名称。有趣的是,我们发现了几个例子,即文献中出现了更具体的术语时,网站上却使用了通用术语,并且在Wnt经典通路中发现了一个排版错误。使用命名实体列表并对语料库进行详尽的断言提取,“金标准”Wnt信号传导集中的53个相互作用中有34个被成功识别(召回率为64%)。此外,自动提取发现了几个涉及关键Wnt相关分子的相互作用,这些相互作用在经典图中缺失或不同,并且通过对文本的人工审查得到了证实。这些结果表明,用于信息提取的NLP技术组合可以形成一个有用的初步工具,用于协助人工注释和维护信号通路数据库。

可用性

管道软件组件可根据作者要求免费提供。

联系方式

dstates@umich.edu

补充信息

http://stateslab.bioinformatics.med.umich.edu/software.html。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验