DDBASE2.0：更新后的结构域数据库，结构域识别能力得到提升。

DDBASE2.0: updated domain database with improved identification of structural domains.

作者信息

Vinayagam A, Shi J, Pugalenthi G, Meenakshi B, Blundell T L, Sowdhamini R

机构信息

National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK campus, Bellary Road, Bangalore, Karnataka 560 065, India.

出版信息

Bioinformatics. 2003 Sep 22;19(14):1760-4. doi: 10.1093/bioinformatics/btg233.

DOI:10.1093/bioinformatics/btg233

PMID:14512346

Abstract

MOTIVATION

Although many methods are available for the identification of structural domains from protein three-dimensional structures, accurate definition of protein domains and the curation of such data for a large number of proteins are often possible only after manual intervention. The availability of domain definitions for protein structural entries is useful for the sequence analysis of aligned domains, structure comparison, fold recognition procedures and understanding protein folding, domain stability and flexibility.

RESULTS

We have improved our method of domain identification starting from the concept of clustering secondary structural elements, but with an intention of reducing the number of discontinuous segments in identified domains. The results of our modified and automatic approach have been compared with the domain definitions from other databases. On a test data set of 55 proteins, this method acquires high agreement (88%) in the number of domains with the crystallographers' definition and resources such as SCOP, CATH, DALI, 3Dee and PDP databases. This method also obtains 98% overlap score with the other resources in the definition of domain boundaries of the 55 proteins. We have examined the domain arrangements of 4592 non-redundant protein chains using the improved method to include 5409 domains leading to an update of the structural domain database.

AVAILABILITY

The latest version of the domain database and online domain identification methods are available from http://www.ncbs.res.in/~faculty/mini/ddbase/ddbase.html

SUPPLEMENTARY INFORMATION

http://www.ncbs.res.in/~faculty/mini/ddbase/supplementary/supplementary.html

摘要

动机

尽管有许多方法可用于从蛋白质三维结构中识别结构域，但通常只有在人工干预之后，才能准确地定义蛋白质结构域并对大量蛋白质的此类数据进行整理。蛋白质结构条目的结构域定义对于比对结构域的序列分析、结构比较、折叠识别程序以及理解蛋白质折叠、结构域稳定性和灵活性很有用。

结果

我们改进了从二级结构元件聚类概念出发的结构域识别方法，旨在减少所识别结构域中不连续片段的数量。我们将改进后的自动方法的结果与其他数据库的结构域定义进行了比较。在一个包含55种蛋白质的测试数据集上，该方法在结构域数量上与晶体学家的定义以及诸如SCOP、CATH、DALI、3Dee和PDP数据库等资源达成了高度一致（88%）。在55种蛋白质的结构域边界定义方面，该方法与其他资源的重叠分数也达到了98%。我们使用改进后的方法检查了4592条非冗余蛋白质链的结构域排列，共包含5409个结构域，从而更新了结构域数据库。