Madera Martin, Vogel Christine, Kummerfeld Sarah K, Chothia Cyrus, Gough Julian
MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D235-9. doi: 10.1093/nar/gkh117.
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
SUPERFAMILY数据库为蛋白质序列提供结构归属,并为结果分析提供一个框架。该数据库的核心是一个隐马尔可夫模型谱库,它代表了所有已知结构的蛋白质。该库基于蛋白质的SCOP分类:每个模型对应一个SCOP结构域,旨在代表一个完整的超家族。我们已将该库应用于来自所有完全测序基因组(目前有154个)、Swiss-Prot和TrEMBL数据库以及其他序列集合中的预测蛋白质。所有蛋白质中近60%至少有一个匹配项,所有残基的一半被归属所覆盖。所有模型和完整结果可在http://supfam.org上下载和在线浏览。用户可以研究其感兴趣的超家族在所有完全测序基因组中的分布,调查它与其他哪些超家族结合,并检索其中出现该超家族的蛋白质。或者,首先关注整个特定基因组,有可能找出其超家族组成,其次,将其与其他基因组的超家族组成进行比较,以检测代表性过高或过低的超家族。此外,该网络服务器还提供以下标准服务:序列搜索;按关键词搜索基因组、超家族和序列标识符;以及对基因组、PDB和自定义序列进行多序列比对。
Nucleic Acids Res. 2004-1-1
BMC Bioinformatics. 2004-3-15
Bioinformatics. 2007-5-15
Nucleic Acids Res. 2005-1-1
Bioinformatics. 2007-7-15
Bioinformatics. 2004-8-4
Nucleic Acids Res. 2005-1-1
Gigascience. 2022-1-12
Front Genet. 2021-1-11
Sci Data. 2020-3-9
Front Microbiol. 2019-9-6
Science. 2003-6-13
Nucleic Acids Res. 2003-1-1
Nucleic Acids Res. 2003-1-1
Nat Biotechnol. 2002-11
Nucleic Acids Res. 2002-10-1
J Mol Biol. 2002-1-25