• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质结构与从生物文本中提取信息:PASTA系统

Protein structures and information extraction from biological texts: the PASTA system.

作者信息

Gaizauskas R, Demetriou G, Artymiuk P J, Willett P

机构信息

Department of Computer Science, University of Sheffield, Western Bank, UK.

出版信息

Bioinformatics. 2003 Jan;19(1):135-43. doi: 10.1093/bioinformatics/19.1.135.

DOI:10.1093/bioinformatics/19.1.135
PMID:12499303
Abstract

MOTIVATION

The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow.

RESULTS

We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.

AVAILABILITY

PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.

摘要

动机

蛋白质结构文献数量的迅速增长意味着有用信息可能隐藏或遗失在已发表的文献中,而查找相关材料的过程(有时是新研究中的限速因素)可能既艰巨又缓慢。

结果

我们描述了蛋白质活性位点模板获取(PASTA)系统,该系统通过从在线科学文章和摘要中自动提取与蛋白质分子中特定氨基酸残基作用相关的信息来解决这些问题。该系统的术语识别和提取能力已针对人工标注数据进行了广泛评估,结果与在难度较低领域获得的最先进结果相比具有优势。PASTA是首个为蛋白质结构领域开发的信息提取(IE)系统,也是迄今为止在生物科学文本上运行的评估最全面的IE系统之一。

可用性

PASTA通过基于浏览器的前端提供其提取结果:http://www.dcs.shef.ac.uk/nlp/pasta/。评估资源(人工标注语料库)也可通过网站获取:http://www.dcs.shef.ac.uk/nlp/pasta/results.html。

相似文献

1
Protein structures and information extraction from biological texts: the PASTA system.蛋白质结构与从生物文本中提取信息:PASTA系统
Bioinformatics. 2003 Jan;19(1):135-43. doi: 10.1093/bioinformatics/19.1.135.
2
GENIA corpus--semantically annotated corpus for bio-textmining.GENIA语料库——用于生物文本挖掘的语义标注语料库。
Bioinformatics. 2003;19 Suppl 1:i180-2. doi: 10.1093/bioinformatics/btg1023.
3
Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors.从文献中自动提取突变数据:MuteXt在G蛋白偶联受体和核激素受体中的应用。
Bioinformatics. 2004 Mar 1;20(4):557-68. doi: 10.1093/bioinformatics/btg449. Epub 2004 Jan 22.
4
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.使用自动自然语言处理技术对Wnt信号通路进行整理:结合统计方法与部分及完全句法分析进行知识提取。
Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.
5
BioRAT: extracting biological information from full-length papers.BioRAT:从全文论文中提取生物学信息。
Bioinformatics. 2004 Nov 22;20(17):3206-13. doi: 10.1093/bioinformatics/bth386. Epub 2004 Jul 1.
6
AnaGram: protein function assignment.AnaGram:蛋白质功能分配
Bioinformatics. 2004 Jan 22;20(2):291-2. doi: 10.1093/bioinformatics/btg414.
7
Textpresso: an ontology-based information retrieval and extraction system for biological literature.Textpresso:一个基于本体的生物文献信息检索与提取系统。
PLoS Biol. 2004 Nov;2(11):e309. doi: 10.1371/journal.pbio.0020309. Epub 2004 Sep 21.
8
Automatic extraction of gene/protein biological functions from biomedical text.从生物医学文本中自动提取基因/蛋白质的生物学功能。
Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.
9
MedBlast: searching articles related to a biological sequence.MedBlast:搜索与生物序列相关的文章。
Bioinformatics. 2004 Jan 1;20(1):75-7. doi: 10.1093/bioinformatics/btg375.
10
Zone analysis in biology articles as a basis for information extraction.生物学文章中的区域分析作为信息提取的基础。
Int J Med Inform. 2006 Jun;75(6):468-87. doi: 10.1016/j.ijmedinf.2005.06.013. Epub 2005 Aug 19.

引用本文的文献

1
PCfun: a hybrid computational framework for systematic characterization of protein complex function.PCfun:一种用于系统表征蛋白质复合物功能的混合计算框架。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac239.
2
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测
Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.
3
A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach.
使用表示学习方法从生物医学文献中提取基因-表型关系的管道。
Bioinformatics. 2018 Jul 1;34(13):i386-i394. doi: 10.1093/bioinformatics/bty263.
4
Framework for automatic information extraction from research papers on nanocrystal devices.从纳米晶体器件研究论文中自动提取信息的框架。
Beilstein J Nanotechnol. 2015 Sep 7;6:1872-82. doi: 10.3762/bjnano.6.190. eCollection 2015.
5
Identifying named entities from PubMed for enriching semantic categories.从PubMed中识别命名实体以丰富语义类别。
BMC Bioinformatics. 2015 Feb 21;16:57. doi: 10.1186/s12859-015-0487-2.
6
Applications of natural language processing in biodiversity science.自然语言处理在生物多样性科学中的应用。
Adv Bioinformatics. 2012;2012:391574. doi: 10.1155/2012/391574. Epub 2012 May 22.
7
Text mining improves prediction of protein functional sites.文本挖掘提高了蛋白质功能位点的预测能力。
PLoS One. 2012;7(2):e32171. doi: 10.1371/journal.pone.0032171. Epub 2012 Feb 29.
8
Connecting the dots between PubMed abstracts.连接 PubMed 摘要之间的点。
PLoS One. 2012;7(1):e29509. doi: 10.1371/journal.pone.0029509. Epub 2012 Jan 3.
9
Improving the extraction of complex regulatory events from scientific text by using ontology-based inference.通过基于本体的推理改进从科学文本中提取复杂调控事件的方法。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S3. doi: 10.1186/2041-1480-2-S5-S3.
10
What the papers say: text mining for genomics and systems biology.文献综述:基因组学和系统生物学的文本挖掘。
Hum Genomics. 2010 Oct;5(1):17-29. doi: 10.1186/1479-7364-5-1-17.