Suppr超能文献

DDBASE2.0:更新后的结构域数据库,结构域识别能力得到提升。

DDBASE2.0: updated domain database with improved identification of structural domains.

作者信息

Vinayagam A, Shi J, Pugalenthi G, Meenakshi B, Blundell T L, Sowdhamini R

机构信息

National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK campus, Bellary Road, Bangalore, Karnataka 560 065, India.

出版信息

Bioinformatics. 2003 Sep 22;19(14):1760-4. doi: 10.1093/bioinformatics/btg233.

Abstract

MOTIVATION

Although many methods are available for the identification of structural domains from protein three-dimensional structures, accurate definition of protein domains and the curation of such data for a large number of proteins are often possible only after manual intervention. The availability of domain definitions for protein structural entries is useful for the sequence analysis of aligned domains, structure comparison, fold recognition procedures and understanding protein folding, domain stability and flexibility.

RESULTS

We have improved our method of domain identification starting from the concept of clustering secondary structural elements, but with an intention of reducing the number of discontinuous segments in identified domains. The results of our modified and automatic approach have been compared with the domain definitions from other databases. On a test data set of 55 proteins, this method acquires high agreement (88%) in the number of domains with the crystallographers' definition and resources such as SCOP, CATH, DALI, 3Dee and PDP databases. This method also obtains 98% overlap score with the other resources in the definition of domain boundaries of the 55 proteins. We have examined the domain arrangements of 4592 non-redundant protein chains using the improved method to include 5409 domains leading to an update of the structural domain database.

AVAILABILITY

The latest version of the domain database and online domain identification methods are available from http://www.ncbs.res.in/~faculty/mini/ddbase/ddbase.html

SUPPLEMENTARY INFORMATION

http://www.ncbs.res.in/~faculty/mini/ddbase/supplementary/supplementary.html

摘要

动机

尽管有许多方法可用于从蛋白质三维结构中识别结构域,但通常只有在人工干预之后,才能准确地定义蛋白质结构域并对大量蛋白质的此类数据进行整理。蛋白质结构条目的结构域定义对于比对结构域的序列分析、结构比较、折叠识别程序以及理解蛋白质折叠、结构域稳定性和灵活性很有用。

结果

我们改进了从二级结构元件聚类概念出发的结构域识别方法,旨在减少所识别结构域中不连续片段的数量。我们将改进后的自动方法的结果与其他数据库的结构域定义进行了比较。在一个包含55种蛋白质的测试数据集上,该方法在结构域数量上与晶体学家的定义以及诸如SCOP、CATH、DALI、3Dee和PDP数据库等资源达成了高度一致(88%)。在55种蛋白质的结构域边界定义方面,该方法与其他资源的重叠分数也达到了98%。我们使用改进后的方法检查了4592条非冗余蛋白质链的结构域排列,共包含5409个结构域,从而更新了结构域数据库。

可用性

最新版本的结构域数据库和在线结构域识别方法可从http://www.ncbs.res.in/~faculty/mini/ddbase/ddbase.html获取。

补充信息

http://www.ncbs.res.in/~faculty/mini/ddbase/supplementary/supplementary.html

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验