Suppr超能文献

UniProt-DAAC:结构域架构比对与分类,一种在UniProtKB中进行自动功能注释的新方法。

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

作者信息

Doğan Tunca, MacDougall Alistair, Saidi Rabie, Poggioli Diego, Bateman Alex, O'Donovan Claire, Martin Maria J

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.

出版信息

Bioinformatics. 2016 Aug 1;32(15):2264-71. doi: 10.1093/bioinformatics/btw114. Epub 2016 Mar 7.

Abstract

MOTIVATION

Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins.

RESULTS

We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach.

AVAILABILITY AND IMPLEMENTATION

The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基于相似性的方法已被广泛应用于推断几乎没有或完全没有实验注释的基因和基因产物的特性。克服仅依赖序列相似性方法局限性的新方法正受到越来越多的关注。这些新方法之一是利用蛋白质中结构域的组织方式。

结果

我们提出了一种通过比较蛋白质序列的结构域架构、基于相似性对蛋白质进行分类并传播功能注释,来自动注释通用蛋白质知识库(UniProtKB)中蛋白质序列的方法。该方法的性能通过使用UniProtKB/Swiss-Prot子集的基因本体(GO)注释进行交叉验证分析来衡量。结果表明该方法在检测功能相似性方面是有效的,平均F值为0.85。我们将该方法应用于UniProtKB/TrEMBL中近5530万个未表征的蛋白质,为12172114个蛋白质产生了44818178个GO术语预测。这些预测中有22%是针对2812016个以前未注释的蛋白质条目,表明该方法所增加价值的重要性。

可用性和实现方式

该方法的结果可在以下网址获取:ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/

联系方式

tdogan@ebi.ac.uk

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49c8/4965628/b92caf4d919a/btw114f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验