Suppr超能文献

通过自动词汇分析推断亚细胞定位。

Inferring sub-cellular localization through automated lexical analysis.

作者信息

Nair Rajesh, Rost Burkhard

机构信息

CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.

出版信息

Bioinformatics. 2002;18 Suppl 1:S78-86. doi: 10.1093/bioinformatics/18.suppl_1.s78.

Abstract

MOTIVATION

The SWISS-PROT sequence database contains keywords of functional annotations for many proteins. In contrast, information about the sub-cellular localization is available for only a few proteins. Experts can often infer localization from keywords describing protein function. We developed LOCkey, a fully automated method for lexical analysis of SWISS-PROT keywords that assigns sub-cellular localization. With the rapid growth in sequence data, the biochemical characterisation of sequences has been falling behind. Our method may be a useful tool for supplementing functional information already automatically available.

RESULTS

The method reached a level of more than 82% accuracy in a full cross-validation test. Due to a lack of functional annotations, we could infer localization for fewer than half of all proteins in SWISS-PROT. We applied LOCkey to annotate five entirely sequenced proteomes, namely Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Arabidopsis thaliana (plant) and a subset of all human proteins. LOCkey found about 8000 new annotations of sub-cellular localization for these eukaryotes.

摘要

动机

SWISS-PROT序列数据库包含许多蛋白质的功能注释关键词。相比之下,只有少数蛋白质具有亚细胞定位信息。专家通常可以从描述蛋白质功能的关键词中推断出定位。我们开发了LOCkey,这是一种用于对SWISS-PROT关键词进行词汇分析以分配亚细胞定位的全自动方法。随着序列数据的快速增长,序列的生化表征已经落后。我们的方法可能是补充已自动获得的功能信息的有用工具。

结果

在全交叉验证测试中,该方法的准确率达到了82%以上。由于缺乏功能注释,我们只能推断出SWISS-PROT中不到一半蛋白质的定位。我们应用LOCkey对五个完全测序的蛋白质组进行注释,即酿酒酵母(酵母)、秀丽隐杆线虫(线虫)、黑腹果蝇(果蝇)、拟南芥(植物)以及所有人类蛋白质的一个子集。LOCkey为这些真核生物发现了约8000个新的亚细胞定位注释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验