蛋白质的计算机表征：通用蛋白质数据库、InterPro和Integr8。

In silico characterization of proteins: UniProt, InterPro and Integr8.

作者信息

Mulder Nicola Jane, Kersey Paul, Pruess Manuela, Apweiler Rolf

机构信息

EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

出版信息

Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4.

DOI:10.1007/s12033-007-9003-x

PMID:18219596

Abstract

Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis.

摘要

来自基因组测序项目的核酸序列作为原始数据提交，生物学家试图从中阐明预测的基因产物的功能。蛋白质序列存储在公共数据库中，如UniProt知识库（UniProtKB），其中管理员会尝试添加预测的和实验性的功能信息。蛋白质功能预测可以通过序列相似性搜索来完成，但另一种方法是使用蛋白质特征，将蛋白质分类为家族和结构域。主要的蛋白质特征数据库可通过集成的InterPro数据库获得，该数据库提供了UniProtKB序列的分类。除了通过蛋白质家族对蛋白质进行表征外，许多研究人员还对分析来自一个基因组的完整蛋白质集（即蛋白质组）感兴趣，并且有数据库和资源提供非冗余蛋白质组集以及对来自具有完全测序基因组的生物体的蛋白质进行分析。本文综述了网络上可用于单个和大规模蛋白质表征以及全蛋白质组分析的工具和资源。

相似文献

In silico characterization of proteins: UniProt, InterPro and Integr8.

Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4.

InterPro and InterProScan: tools for protein sequence classification and comparison.

Methods Mol Biol. 2007;396:59-70. doi: 10.1007/978-1-59745-515-2_5.

Applications of InterPro in protein annotation and genome analysis.

Brief Bioinform. 2002 Sep;3(3):285-95. doi: 10.1093/bib/3.3.285.

InterPro protein classification.

Methods Mol Biol. 2011;694:37-47. doi: 10.1007/978-1-60761-977-2_3.

UniProt: the Universal Protein knowledgebase.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D115-9. doi: 10.1093/nar/gkh131.

Integr8 and Genome Reviews: integrated views of complete genomes and proteomes.

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D297-302. doi: 10.1093/nar/gki039.

UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.

UniRef: comprehensive and non-redundant UniProt reference clusters.

Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. Epub 2007 Mar 22.

InterPro: an integrated documentation resource for protein families, domains and functional sites.

Brief Bioinform. 2002 Sep;3(3):225-35. doi: 10.1093/bib/3.3.225.

UniProt: the universal protein knowledgebase.

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

引用本文的文献

Mutation of Renders Resistant to First-Line Antibiotics Trimethoprim/Sulfamethoxazole and Levofloxacin.

Antibiotics (Basel). 2025 May 28;14(6):550. doi: 10.3390/antibiotics14060550.

Investigating diversity and similarity between CBM13 modules and ricin-B lectin domains using sequence similarity networks.

BMC Genomics. 2024 Jun 27;25(1):643. doi: 10.1186/s12864-024-10554-1.

The Role of the Gene on Melatonin Biosynthesis in : A Search of New Arylalkylamine -Acetyltransferases.

Microorganisms. 2023 Apr 25;11(5):1115. doi: 10.3390/microorganisms11051115.

Insights into Bioinformatic Applications for Glycosylation: Instigating an Awakening towards Applying Glycoinformatic Resources for Cancer Diagnosis and Therapy.

Int J Mol Sci. 2020 Dec 8;21(24):9336. doi: 10.3390/ijms21249336.

Assessment of Genetic Diversity, Population Structure, and Evolutionary Relationship of Uncharacterized Genes in a Novel Germplasm Collection of Diploid and Allotetraploid Accessions Using EST and Genomic SSR Markers.

Int J Mol Sci. 2018 Aug 14;19(8):2401. doi: 10.3390/ijms19082401.

Visualizing viral protein structures in cells using genetic probes for correlated light and electron microscopy.

Methods. 2015 Nov 15;90:39-48. doi: 10.1016/j.ymeth.2015.06.002. Epub 2015 Jun 9.

Genome based analysis of type-I polyketide synthase and nonribosomal peptide synthetase gene clusters in seven strains of five representative Nocardia species.

BMC Genomics. 2014 Apr 30;15(1):323. doi: 10.1186/1471-2164-15-323.

In silico prediction of antimalarial drug target candidates.

Int J Parasitol Drugs Drug Resist. 2012 Jul 17;2:191-9. doi: 10.1016/j.ijpddr.2012.07.002. eCollection 2012 Dec.

Protein domains of unknown function are essential in bacteria.

mBio. 2013 Dec 31;5(1):e00744-13. doi: 10.1128/mBio.00744-13.

Analysis of the Protein phosphotome of Entamoeba histolytica reveals an intricate phosphorylation network.

PLoS One. 2013 Nov 13;8(11):e78714. doi: 10.1371/journal.pone.0078714. eCollection 2013.

本文引用的文献

New developments in the InterPro database.

Nucleic Acids Res. 2007 Jan;35(Database issue):D224-8. doi: 10.1093/nar/gkl841.

GenBank.

Nucleic Acids Res. 2007 Jan;35(Database issue):D21-5. doi: 10.1093/nar/gkl986.

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12. doi: 10.1093/nar/gkl1031. Epub 2006 Dec 14.

TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes.

Nucleic Acids Res. 2007 Jan;35(Database issue):D260-4. doi: 10.1093/nar/gkl1043. Epub 2006 Dec 6.

EMBL Nucleotide Sequence Database in 2006.

Nucleic Acids Res. 2007 Jan;35(Database issue):D16-20. doi: 10.1093/nar/gkl913. Epub 2006 Dec 5.

Ensembl 2007.

Nucleic Acids Res. 2007 Jan;35(Database issue):D610-7. doi: 10.1093/nar/gkl996. Epub 2006 Dec 5.

IntAct--open source resource for molecular interaction data.

Nucleic Acids Res. 2007 Jan;35(Database issue):D561-5. doi: 10.1093/nar/gkl958. Epub 2006 Dec 1.

The Universal Protein Resource (UniProt).

Nucleic Acids Res. 2007 Jan;35(Database issue):D193-7. doi: 10.1093/nar/gkl929. Epub 2006 Nov 16.

Expanded protein information at SGD: new pages and proteome browser.

Nucleic Acids Res. 2007 Jan;35(Database issue):D468-71. doi: 10.1093/nar/gkl931. Epub 2006 Nov 16.

The mouse genome database (MGD): new features facilitating a model system.

Nucleic Acids Res. 2007 Jan;35(Database issue):D630-7. doi: 10.1093/nar/gkl940. Epub 2006 Nov 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质的计算机表征：通用蛋白质数据库、InterPro和Integr8。

In silico characterization of proteins: UniProt, InterPro and Integr8.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献