Suppr超能文献

ProteinWorldDB:查询来自完整基因组的蛋白质组之间的激进两两比对。

ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

机构信息

Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil.

出版信息

Bioinformatics. 2010 Mar 1;26(5):705-7. doi: 10.1093/bioinformatics/btq011. Epub 2010 Jan 19.

Abstract

MOTIVATION

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach.

AVAILABILITY

The database can be accessed through http://proteinworlddb.org

摘要

动机

现代生物研究中的许多分析都是基于生物序列之间的比较,从而得出功能、进化和结构推断。当比较大量的序列时,通常会使用启发式方法,这导致了一定程度的准确性不足。为了改进和验证此类比较的结果,我们使用 Smith-Waterman 算法的实现,对属于 RefSeq 数据库的 400 万个蛋白质序列进行了激进的全对全比较。这种极其密集的计算方法是在世界社区网格(通过基因组比较项目)的帮助下实现的。由此产生的数据库 ProteinWorldDB 现在提供了包含两两蛋白质比对的坐标及其各自得分的信息。用户可以下载、比较和分析结果,根据基因组、蛋白质功能或簇进行过滤。ProteinWorldDB 与来自 Swiss-Prot、Pfam、KEGG、NCBI Taxonomy 数据库和基因本体论的注释集成在一起。该数据库是一项独特而有价值的资产,它代表了使用严格的动态编程方法对数百个完全测序的基因组中编码的整个蛋白质内容进行交叉比较的可靠且一致数据集的重要努力。

可用性

该数据库可通过 http://proteinworlddb.org 访问。

相似文献

4
SIMAP: the similarity matrix of proteins.SIMAP:蛋白质相似性矩阵。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D252-6. doi: 10.1093/nar/gkj106.
9
Multiple whole-genome alignments without a reference organism.无参考生物体的多个全基因组比对
Genome Res. 2009 Apr;19(4):682-9. doi: 10.1101/gr.081778.108. Epub 2009 Jan 28.

本文引用的文献

1
AnEnPi: identification and annotation of analogous enzymes.AnEnPi:类似酶的鉴定与注释
BMC Bioinformatics. 2008 Dec 17;9:544. doi: 10.1186/1471-2105-9-544.
2
SIMAP--structuring the network of protein similarities.SIMAP——构建蛋白质相似性网络
Nucleic Acids Res. 2008 Jan;36(Database issue):D289-92. doi: 10.1093/nar/gkm963. Epub 2007 Nov 23.
5
From genomics to chemical genomics: new developments in KEGG.从基因组学到化学基因组学:KEGG的新进展
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D354-7. doi: 10.1093/nar/gkj102.
6
The limits of protein sequence comparison?蛋白质序列比较的局限性?
Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005.
7
The many faces of sequence alignment.序列比对的多种形式。
Brief Bioinform. 2005 Mar;6(1):6-22. doi: 10.1093/bib/6.1.6.
9
Enzyme function less conserved than anticipated.酶功能的保守性低于预期。
J Mol Biol. 2002 Apr 26;318(2):595-608. doi: 10.1016/S0022-2836(02)00016-5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验