Pandit Shashi B, Bhadra Rana, Gowri V S, Balaji S, Anand B, Srinivasan N
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India.
BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.
SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.
The current version of SUPFAM (release 1.4) corresponds to significant enhancements and major developments compared to the earlier and basic version. In the present version we have used RPS-BLAST, which is robust and sensitive, for profile matching. The reliability of connections between protein families is ensured better than before by use of benchmarked criteria involving strict e-value cut-off and a minimal alignment length condition. An e-value based indication of reliability of connections is now presented in the database. Web access to a RPS-BLAST-based tool to associate a query sequence to one of the family profiles in SUPFAM is available with the current release. In terms of the scientific content the present release of SUPFAM is entirely reorganized with the use of 6190 Pfam families and 2317 structural families derived from SCOP. Due to a steep increase in the number of sequence and structural families used in SUPFAM the details of scientific content in the present release are almost entirely complementary to previous basic version. Of the 2286 families, we could relate 245 Pfam families with apparently no structural information to families of known 3-D structures, thus resulting in the identification of new families in the existing superfamilies. Using the profiles of 3904 Pfam families of yet unknown structure, an all-against-all comparison involving sequence-profile match resulted in clustering of 96 Pfam families into 39 new potential superfamilies.
SUPFAM presents many non-trivial superfamily relationships of sequence families involved in a variety of functions and hence the information content is of interest to a wide scientific community. The grouping of related proteins without a known structure in SUPFAM is useful in identifying priority targets for structural genomics initiatives and in the assignment of putative functions. Database URL: http://pauling.mbu.iisc.ernet.in/~supfam.
SUPFAM数据库是已知或未知三维结构的蛋白质结构域家族之间超家族关系的汇编。在SUPFAM中,通过轮廓匹配将来自Pfam的序列家族与来自SCOP的结构家族关联起来,以产生已知结构的序列超家族。随后进行全对全家族轮廓匹配,以推断出一系列结构未知的新潜在超家族。
与早期的基础版本相比,SUPFAM的当前版本(版本1.4)有了显著增强和重大改进。在当前版本中,我们使用了强大且灵敏的RPS-BLAST进行轮廓匹配。通过使用涉及严格e值截止和最小比对长度条件的基准标准,蛋白质家族之间连接的可靠性比以前得到了更好的保证。数据库现在提供基于e值的连接可靠性指示。当前版本提供了基于RPS-BLAST的工具的网络访问,用于将查询序列与SUPFAM中的一个家族轮廓相关联。就科学内容而言,SUPFAM的当前版本使用来自SCOP的6190个Pfam家族和2317个结构家族进行了全面重组。由于SUPFAM中使用的序列和结构家族数量急剧增加,当前版本中科学内容的细节几乎与以前的基础版本完全互补。在2286个家族中,我们能够将245个显然没有结构信息的Pfam家族与已知三维结构的家族相关联,从而在现有超家族中识别出新的家族。使用3904个结构未知的Pfam家族的轮廓,通过全对全序列轮廓匹配比较,将96个Pfam家族聚类为39个新的潜在超家族。
SUPFAM展示了许多涉及多种功能的序列家族的重要超家族关系,因此其信息内容受到广泛科学界的关注。SUPFAM中对无已知结构的相关蛋白质进行分组,有助于确定结构基因组计划的优先目标以及推定功能的分配。数据库网址:http://pauling.mbu.iisc.ernet.in/~supfam 。