Heger Andreas, Korpelainen Eija, Hupponen Taavi, Mattila Kimmo, Ollikainen Vesa, Holm Liisa
MRC Functional Genetics Unit, University of Oxford, UK.
Nucleic Acids Res. 2008 Jan;36(Database issue):D276-80. doi: 10.1093/nar/gkm879. Epub 2007 Nov 5.
Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria-for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.
序列相似性/数据库搜索是分子生物学的基石。PairsDB是一个旨在使探索蛋白质序列及其相似性关系变得快速且容易的数据库。PairsDB背后是蛋白质序列以及它们之间的BLAST和PSI-BLAST比对的全面集合。不是针对每个请求单独运行BLAST或PSI-BLAST,而是从预计算比对的数据库中即时检索结果。过滤选项允许你找到一组满足一组标准的序列——例如,所有具有已解析结构且无跨膜片段的人类蛋白质。PairsDB会持续更新并涵盖Uniprot中的所有序列。数据存储在MySQL关系数据库中。数据文件将在ftp://nic.funet.fi/pub/sci/molbio上提供下载。也可以通过http://pairsdb.csc.fi以交互方式访问PairsDB。PairsDB数据是构建各种下游自动化分析管道的宝贵平台。例如,全对全相似性关系图是聚类蛋白质家族、划定结构域、通过一致性度量提高比对准确性以及定义直系同源基因的起点。此外,查询锚定的堆叠序列比对、图谱和共有序列在研究序列保守模式以寻找可能功能位点的线索方面很有用。