Suppr超能文献

蛋白质结构与从生物文本中提取信息:PASTA系统

Protein structures and information extraction from biological texts: the PASTA system.

作者信息

Gaizauskas R, Demetriou G, Artymiuk P J, Willett P

机构信息

Department of Computer Science, University of Sheffield, Western Bank, UK.

出版信息

Bioinformatics. 2003 Jan;19(1):135-43. doi: 10.1093/bioinformatics/19.1.135.

Abstract

MOTIVATION

The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow.

RESULTS

We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.

AVAILABILITY

PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.

摘要

动机

蛋白质结构文献数量的迅速增长意味着有用信息可能隐藏或遗失在已发表的文献中,而查找相关材料的过程(有时是新研究中的限速因素)可能既艰巨又缓慢。

结果

我们描述了蛋白质活性位点模板获取(PASTA)系统,该系统通过从在线科学文章和摘要中自动提取与蛋白质分子中特定氨基酸残基作用相关的信息来解决这些问题。该系统的术语识别和提取能力已针对人工标注数据进行了广泛评估,结果与在难度较低领域获得的最先进结果相比具有优势。PASTA是首个为蛋白质结构领域开发的信息提取(IE)系统,也是迄今为止在生物科学文本上运行的评估最全面的IE系统之一。

可用性

PASTA通过基于浏览器的前端提供其提取结果:http://www.dcs.shef.ac.uk/nlp/pasta/。评估资源(人工标注语料库)也可通过网站获取:http://www.dcs.shef.ac.uk/nlp/pasta/results.html。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验