使用自动自然语言处理技术对Wnt信号通路进行整理：结合统计方法与部分及完全句法分析进行知识提取。

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

作者信息

Santos Carlos, Eggle Daniela, States David J

机构信息

Bioinformatics Program, The University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.

DOI:10.1093/bioinformatics/bti165

PMID:15564295

Abstract

MOTIVATION

Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map.

RESULTS

A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases.

AVAILABILITY

The pipeline software components are freely available on request to the authors.

CONTACT

dstates@umich.edu

SUPPLEMENTARY INFORMATION

http://stateslab.bioinformatics.med.umich.edu/software.html.

摘要

动机

Wnt信号传导是一个非常活跃的研究领域，每天都有大量相关出版物问世。构建和维护描述信号转导网络的数据库是一项耗时且要求很高的任务，需要仔细的文献分析和广泛的特定领域知识。例如，截至2003年底，已鉴定出50多种参与Wnt信号转导的因子。在这项工作中，我们描述了一种自然语言处理（NLP）系统，该系统能够识别自由文本中对生物相互作用网络的引用，并自动组装蛋白质关联和相互作用图谱。

结果

通过人工扫描Wnt基因网站（http://www.stanford.edu/~rnusse/wntwindow.html）得出了一组“金标准”名称和断言，其中包括53个参与Wnt信号传导的相互作用。该系统用于分析与Wnt信号传导相关的同行评审文章语料库，包括3369篇来自PubMed的文章和1230篇全文论文。通过对Wnt文献中与一般信号转导文献相比过度出现的名词短语进行卡方分析，确定了关键Wnt通路相关蛋白质和生物实体的名称。有趣的是，我们发现了几个例子，即文献中出现了更具体的术语时，网站上却使用了通用术语，并且在Wnt经典通路中发现了一个排版错误。使用命名实体列表并对语料库进行详尽的断言提取，“金标准”Wnt信号传导集中的53个相互作用中有34个被成功识别（召回率为64%）。此外，自动提取发现了几个涉及关键Wnt相关分子的相互作用，这些相互作用在经典图中缺失或不同，并且通过对文本的人工审查得到了证实。这些结果表明，用于信息提取的NLP技术组合可以形成一个有用的初步工具，用于协助人工注释和维护信号通路数据库。

可用性

管道软件组件可根据作者要求免费提供。

联系方式

dstates@umich.edu

补充信息

http://stateslab.bioinformatics.med.umich.edu/software.html。

相似文献

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.使用自动自然语言处理技术对Wnt信号通路进行整理：结合统计方法与部分及完全句法分析进行知识提取。

Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.

Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称：一种机器学习方法。

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

RelEx--relation extraction using dependency parse trees.RelEx——使用依存句法分析树进行关系抽取。

Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.

Gene name ambiguity of eukaryotic nomenclatures.真核生物命名法中的基因名称歧义。

Bioinformatics. 2005 Jan 15;21(2):248-56. doi: 10.1093/bioinformatics/bth496. Epub 2004 Aug 27.

Extracting human protein interactions from MEDLINE using a full-sentence parser.使用全句解析器从MEDLINE中提取人类蛋白质相互作用。

Bioinformatics. 2004 Mar 22;20(5):604-11. doi: 10.1093/bioinformatics/btg452. Epub 2004 Jan 22.

Combination of text-mining algorithms increases the performance.文本挖掘算法的组合提高了性能。

Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.

Concept-based annotation of enzyme classes.基于概念的酶类注释。

Bioinformatics. 2005 May 1;21(9):2059-66. doi: 10.1093/bioinformatics/bti284. Epub 2005 Jan 20.

Inter-species normalization of gene mentions with GNAT.使用GNAT对基因提及进行种间标准化。

Bioinformatics. 2008 Aug 15;24(16):i126-132. doi: 10.1093/bioinformatics/btn299.

Distributed modules for text annotation and IE applied to the biomedical domain.应用于生物医学领域的文本注释和信息提取的分布式模块。

Int J Med Inform. 2006 Jun;75(6):496-500. doi: 10.1016/j.ijmedinf.2005.06.011. Epub 2005 Aug 8.

Negation of protein-protein interactions: analysis and extraction.蛋白质-蛋白质相互作用的否定：分析与提取

Bioinformatics. 2007 Jul 1;23(13):i424-32. doi: 10.1093/bioinformatics/btm184.

引用本文的文献

Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics.连接人工智能与生物科学：生物信息学中大型语言模型的全面综述

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf357.

Biocuration with insufficient resources and fixed timelines.在资源不足且时间线固定的情况下进行生物数据编目。

Database (Oxford). 2015 Dec 26;2015. doi: 10.1093/database/bav116. Print 2015.

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.通过对免疫组织化学研究摘要进行文本挖掘自动提取淋巴瘤中精确的蛋白质表达模式。

J Pathol Inform. 2013 Jul 31;4:20. doi: 10.4103/2153-3539.115880. eCollection 2013.

Automatic extraction of biomolecular interactions: an empirical approach.生物分子相互作用的自动提取：一种经验方法。

BMC Bioinformatics. 2013 Jul 24;14:234. doi: 10.1186/1471-2105-14-234.

A text-mining system for extracting metabolic reactions from full-text articles.一种从全文文章中提取代谢反应的文本挖掘系统。

BMC Bioinformatics. 2012 Jul 23;13:172. doi: 10.1186/1471-2105-13-172.

What the papers say: text mining for genomics and systems biology.文献综述：基因组学和系统生物学的文本挖掘。

Hum Genomics. 2010 Oct;5(1):17-29. doi: 10.1186/1479-7364-5-1-17.

BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.BSQA：利用从昆虫生物学文献中提取的实体关系语义进行集成文本挖掘。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W175-81. doi: 10.1093/nar/gkq544.

Biomedical text mining and its applications.生物医学文本挖掘及其应用。

PLoS Comput Biol. 2009 Dec;5(12):e1000597. doi: 10.1371/journal.pcbi.1000597. Epub 2009 Dec 24.

PathBinder--text empirics and automatic extraction of biomolecular interactions.PathBinder--文本实证与生物分子相互作用的自动提取。

BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S18. doi: 10.1186/1471-2105-10-S11-S18.

Public databases and software for the pathway analysis of cancer genomes.用于癌症基因组通路分析的公共数据库和软件。

Cancer Inform. 2007 Dec 12;3:379-97.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用自动自然语言处理技术对Wnt信号通路进行整理：结合统计方法与部分及完全句法分析进行知识提取。

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献