自动化蛋白质序列数据库分类。II. 从序列相似性描绘结构域边界

Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.

作者信息

Gracy J, Argos P

机构信息

European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

Bioinformatics. 1998;14(2):174-87. doi: 10.1093/bioinformatics/14.2.174.

DOI:10.1093/bioinformatics/14.2.174

PMID:9545450

Abstract

MOTIVATION

Decomposing each protein into modular domains is a basic prerequisite to classify accurately structural units in biological molecules. Boundaries between domains are indicated by two similar amino acid sequence segments located within the same protein (repeats) or within homologous proteins at notably different distances from their respective N- or C-termini.

RESULTS

We have developed an automated method that combines such positional constraints derived from various detected pairwise sequence similarities to delineate the modular organization of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SWISS-PROT databanks and shared <60% sequence identity amongst pairs. The resultant clustering, delineation and multiple alignment of 24 380 sequence fragments yielded a new database of 4364 domain families. Comparison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundaries and multiple sequence alignments. The accuracy and sensitivity of the method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.

AVAILABILITY

The resulting database, called DOMO, is available through the database search routine SRS at Infobiogen (http://www.infobiogen.fr/srs5/), EBI (http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl-heidelberg.de/srs5/) World Wide Web sites.

CONTACT

gracy@infobiogen.fr

摘要

动机

将每个蛋白质分解为模块化结构域是准确分类生物分子中结构单元的基本前提。结构域之间的边界由位于同一蛋白质内（重复序列）或同源蛋白质内、距各自N端或C端明显不同距离的两个相似氨基酸序列片段指示。

结果

我们开发了一种自动化方法，该方法结合了从各种检测到的成对序列相似性中得出的位置限制，以描绘蛋白质的模块化组织。该程序已应用于一个包含26990个蛋白质的非冗余数据集，这些蛋白质的序列取自PIR和SWISS-PROT数据库，且两两之间的序列同一性小于60%。对24380个序列片段进行聚类、描绘和多序列比对后，得到了一个包含4364个结构域家族的新数据库。将该结构域集合与PRODOM的结构域集合进行比较，结果表明在结构域家族的数量和大小、结构域边界以及多序列比对方面有了明显改进。锚蛋白样重复序列和EGF样模块的结果说明了该方法的准确性和敏感性。

可用性

所得数据库名为DOMO，可通过Infobiogen（http://www.infobiogen.fr/srs5/）、EBI（http://srs.ebi.ac.uk:5000/）和EMBL（http://www.embl-heidelberg.de/srs5/）万维网站点的数据库搜索程序SRS获取。

联系方式

gracy@infobiogen.fr

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

自动化蛋白质序列数据库分类。II. 从序列相似性描绘结构域边界

Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

相似文献

引用本文的文献

自动化蛋白质序列数据库分类。II. 从序列相似性描绘结构域边界

Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

相似文献

引用本文的文献