ProteinWorldDB：查询来自完整基因组的蛋白质组之间的激进两两比对。

ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

机构信息

Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil.

出版信息

Bioinformatics. 2010 Mar 1;26(5):705-7. doi: 10.1093/bioinformatics/btq011. Epub 2010 Jan 19.

DOI:10.1093/bioinformatics/btq011

PMID:20089515

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2828119/

Abstract

MOTIVATION

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach.

AVAILABILITY

The database can be accessed through http://proteinworlddb.org

摘要

动机

现代生物研究中的许多分析都是基于生物序列之间的比较，从而得出功能、进化和结构推断。当比较大量的序列时，通常会使用启发式方法，这导致了一定程度的准确性不足。为了改进和验证此类比较的结果，我们使用 Smith-Waterman 算法的实现，对属于 RefSeq 数据库的 400 万个蛋白质序列进行了激进的全对全比较。这种极其密集的计算方法是在世界社区网格（通过基因组比较项目）的帮助下实现的。由此产生的数据库 ProteinWorldDB 现在提供了包含两两蛋白质比对的坐标及其各自得分的信息。用户可以下载、比较和分析结果，根据基因组、蛋白质功能或簇进行过滤。ProteinWorldDB 与来自 Swiss-Prot、Pfam、KEGG、NCBI Taxonomy 数据库和基因本体论的注释集成在一起。该数据库是一项独特而有价值的资产，它代表了使用严格的动态编程方法对数百个完全测序的基因组中编码的整个蛋白质内容进行交叉比较的可靠且一致数据集的重要努力。