UniProt-DAAC：结构域架构比对与分类，一种在UniProtKB中进行自动功能注释的新方法。

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

作者信息

Doğan Tunca, MacDougall Alistair, Saidi Rabie, Poggioli Diego, Bateman Alex, O'Donovan Claire, Martin Maria J

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.

出版信息

Bioinformatics. 2016 Aug 1;32(15):2264-71. doi: 10.1093/bioinformatics/btw114. Epub 2016 Mar 7.

DOI:10.1093/bioinformatics/btw114

PMID:27153729

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4965628/

Abstract

MOTIVATION

Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins.

RESULTS

We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach.

AVAILABILITY AND IMPLEMENTATION

The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基于相似性的方法已被广泛应用于推断几乎没有或完全没有实验注释的基因和基因产物的特性。克服仅依赖序列相似性方法局限性的新方法正受到越来越多的关注。这些新方法之一是利用蛋白质中结构域的组织方式。

结果

我们提出了一种通过比较蛋白质序列的结构域架构、基于相似性对蛋白质进行分类并传播功能注释，来自动注释通用蛋白质知识库（UniProtKB）中蛋白质序列的方法。该方法的性能通过使用UniProtKB/Swiss-Prot子集的基因本体（GO）注释进行交叉验证分析来衡量。结果表明该方法在检测功能相似性方面是有效的，平均F值为0.85。我们将该方法应用于UniProtKB/TrEMBL中近5530万个未表征的蛋白质，为12172114个蛋白质产生了44818178个GO术语预测。这些预测中有22%是针对2812016个以前未注释的蛋白质条目，表明该方法所增加价值的重要性。

可用性和实现方式

该方法的结果可在以下网址获取：ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/

联系方式

tdogan@ebi.ac.uk

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49c8/4965628/b92caf4d919a/btw114f1p.jpg

相似文献

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

Bioinformatics. 2016 Aug 1;32(15):2264-71. doi: 10.1093/bioinformatics/btw114. Epub 2016 Mar 7.

UniProtKB/Swiss-Prot.

Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4.

UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.

UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.

Bioinformatics. 2020 Nov 1;36(17):4643-4648. doi: 10.1093/bioinformatics/btaa485.

Annotation of biologically relevant ligands in UniProtKB using ChEBI.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac793.

The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program.

J Proteomics. 2009 Apr 13;72(3):567-73. doi: 10.1016/j.jprot.2008.11.010. Epub 2008 Nov 24.

The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.

The Universal Protein Resource (UniProt): an expanding universe of protein information.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D187-91. doi: 10.1093/nar/gkj161.

Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase.

Hum Mutat. 2008 Mar;29(3):361-6. doi: 10.1002/humu.20671.

How to inherit statistically validated annotation within BAR+ protein clusters.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-14-S3-S4. Epub 2013 Feb 28.

引用本文的文献

Oxygen is toxic in the cold in .

Front Physiol. 2024 Dec 24;15:1471249. doi: 10.3389/fphys.2024.1471249. eCollection 2024.

Novel anti- effects elicited by a repurposed poly (ADP-ribose) polymerase inhibitor AZ9482.

Front Cell Infect Microbiol. 2024 May 28;14:1414135. doi: 10.3389/fcimb.2024.1414135. eCollection 2024.

Mutual annotation-based prediction of protein domain functions with Domain2GO.

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

ASCARIS: Positional feature annotation and protein structure-based representation of single amino acid variations.

Comput Struct Biotechnol J. 2023 Sep 17;21:4743-4758. doi: 10.1016/j.csbj.2023.09.017. eCollection 2023.

An 85-amino-acid polypeptide from larvae (antlions) homologous to heat shock factor binding protein 1 with antiproliferative activity against MG-63 osteosarcoma cells in vitro.

Asian Biomed (Res Rev News). 2022 Aug 31;16(4):201-211. doi: 10.2478/abm-2022-0024. eCollection 2022 Aug.

Feature architecture aware phylogenetic profiling indicates a functional diversification of type IVa pili in the nosocomial pathogen Acinetobacter baumannii.

PLoS Genet. 2023 Jul 27;19(7):e1010646. doi: 10.1371/journal.pgen.1010646. eCollection 2023 Jul.

An O-sensing diguanylate cyclase broadly affects the aerobic transcriptome in the phytopathogen .

Front Microbiol. 2023 Jul 7;14:1134742. doi: 10.3389/fmicb.2023.1134742. eCollection 2023.

Clinical and bi-genomic DNA findings of patients suspected to have mitochondrial diseases.

Front Genet. 2023 Jun 12;14:1191159. doi: 10.3389/fgene.2023.1191159. eCollection 2023.

FAS: assessing the similarity between proteins using multi-layered feature architectures.

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad226.

How to approach machine learning-based prediction of drug/compound-target interactions.

J Cheminform. 2023 Feb 6;15(1):16. doi: 10.1186/s13321-023-00689-w.

本文引用的文献

JPred4: a protein secondary structure prediction server.

Nucleic Acids Res. 2015 Jul 1;43(W1):W389-94. doi: 10.1093/nar/gkv332. Epub 2015 Apr 16.

The InterPro protein families database: the classification resource after 15 years.

Nucleic Acids Res. 2015 Jan;43(Database issue):D213-21. doi: 10.1093/nar/gku1243. Epub 2014 Nov 26.

Gene Ontology Consortium: going forward.

Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26.

UniProt: a hub for protein information.

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

HAMAP in 2015: updates to the protein family classification and annotation system.

Nucleic Acids Res. 2015 Jan;43(Database issue):D1064-70. doi: 10.1093/nar/gku1002. Epub 2014 Oct 27.

A million peptide motifs for the molecular biologist.

Mol Cell. 2014 Jul 17;55(2):161-9. doi: 10.1016/j.molcel.2014.05.032.

Pfam: the protein families database.

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences.

PLoS One. 2013 Sep 12;8(9):e75458. doi: 10.1371/journal.pone.0075458. eCollection 2013.

Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

A large-scale evaluation of computational protein function prediction.

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

UniProt-DAAC：结构域架构比对与分类，一种在UniProtKB中进行自动功能注释的新方法。

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献