• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自动化蛋白质序列数据库分类。II. 从序列相似性描绘结构域边界

Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.

作者信息

Gracy J, Argos P

机构信息

European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

Bioinformatics. 1998;14(2):174-87. doi: 10.1093/bioinformatics/14.2.174.

DOI:10.1093/bioinformatics/14.2.174
PMID:9545450
Abstract

MOTIVATION

Decomposing each protein into modular domains is a basic prerequisite to classify accurately structural units in biological molecules. Boundaries between domains are indicated by two similar amino acid sequence segments located within the same protein (repeats) or within homologous proteins at notably different distances from their respective N- or C-termini.

RESULTS

We have developed an automated method that combines such positional constraints derived from various detected pairwise sequence similarities to delineate the modular organization of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SWISS-PROT databanks and shared <60% sequence identity amongst pairs. The resultant clustering, delineation and multiple alignment of 24 380 sequence fragments yielded a new database of 4364 domain families. Comparison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundaries and multiple sequence alignments. The accuracy and sensitivity of the method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.

AVAILABILITY

The resulting database, called DOMO, is available through the database search routine SRS at Infobiogen (http://www.infobiogen.fr/srs5/), EBI (http://srs.ebi.ac.uk:5000/) and EMBL (http://www.embl-heidelberg.de/srs5/) World Wide Web sites.

CONTACT

gracy@infobiogen.fr

摘要

动机

将每个蛋白质分解为模块化结构域是准确分类生物分子中结构单元的基本前提。结构域之间的边界由位于同一蛋白质内(重复序列)或同源蛋白质内、距各自N端或C端明显不同距离的两个相似氨基酸序列片段指示。

结果

我们开发了一种自动化方法,该方法结合了从各种检测到的成对序列相似性中得出的位置限制,以描绘蛋白质的模块化组织。该程序已应用于一个包含26990个蛋白质的非冗余数据集,这些蛋白质的序列取自PIR和SWISS-PROT数据库,且两两之间的序列同一性小于60%。对24380个序列片段进行聚类、描绘和多序列比对后,得到了一个包含4364个结构域家族的新数据库。将该结构域集合与PRODOM的结构域集合进行比较,结果表明在结构域家族的数量和大小、结构域边界以及多序列比对方面有了明显改进。锚蛋白样重复序列和EGF样模块的结果说明了该方法的准确性和敏感性。

可用性

所得数据库名为DOMO,可通过Infobiogen(http://www.infobiogen.fr/srs5/)、EBI(http://srs.ebi.ac.uk:5000/)和EMBL(http://www.embl-heidelberg.de/srs5/)万维网站点的数据库搜索程序SRS获取。

联系方式

gracy@infobiogen.fr

相似文献

1
Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities.自动化蛋白质序列数据库分类。II. 从序列相似性描绘结构域边界
Bioinformatics. 1998;14(2):174-87. doi: 10.1093/bioinformatics/14.2.174.
2
Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment.自动化蛋白质序列数据库分类。I. 组成相似性搜索、局部相似性搜索和多序列比对的整合
Bioinformatics. 1998;14(2):164-73. doi: 10.1093/bioinformatics/14.2.164.
3
Removing near-neighbour redundancy from large protein sequence collections.去除大型蛋白质序列集合中的近邻冗余。
Bioinformatics. 1998 Jun;14(5):423-9. doi: 10.1093/bioinformatics/14.5.423.
4
Recent improvements of the ProDom database of protein domain families.蛋白质结构域家族的ProDom数据库的近期改进。
Nucleic Acids Res. 1999 Jan 1;27(1):263-7. doi: 10.1093/nar/27.1.263.
5
Exhaustive enumeration of protein domain families.蛋白质结构域家族的详尽枚举。
J Mol Biol. 2003 May 2;328(3):749-67. doi: 10.1016/s0022-2836(03)00269-9.
6
A set-theoretic approach to database searching and clustering.一种用于数据库搜索和聚类的集合论方法。
Bioinformatics. 1998 Jun;14(5):430-8. doi: 10.1093/bioinformatics/14.5.430.
7
Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations.Blocks+:一个源自多个汇编的蛋白质比对模块的非冗余数据库。
Bioinformatics. 1999 Jun;15(6):471-9. doi: 10.1093/bioinformatics/15.6.471.
8
The ProDom database of protein domain families.蛋白质结构域家族的ProDom数据库。
Nucleic Acids Res. 1998 Jan 1;26(1):323-6. doi: 10.1093/nar/26.1.323.
9
DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins.DIVCLUS:GEANFAMMER软件包中的一种自动方法,可在单结构域和多结构域蛋白质中找到同源结构域。
Bioinformatics. 1998;14(2):144-50. doi: 10.1093/bioinformatics/14.2.144.
10
WWW access to the SYSTERS protein sequence cluster set.通过万维网访问SYSTERS蛋白质序列聚类集。
Bioinformatics. 1999 Mar;15(3):262-3. doi: 10.1093/bioinformatics/15.3.262.

引用本文的文献

1
Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.基于 minhashing 的未注释蛋白质保守区域大数据集的无比对聚类。
BMC Bioinformatics. 2018 Mar 5;19(1):83. doi: 10.1186/s12859-018-2080-y.
2
DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.多宝:通过整合进化信号和机器学习进行蛋白质结构域边界预测。
BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43.
3
MACHOS: Markov clusters of homologous subsequences.MACHOS:同源子序列的马尔可夫聚类
Bioinformatics. 2008 Jul 1;24(13):i77-85. doi: 10.1093/bioinformatics/btn144.
4
Identifying foldable regions in protein sequence from the hydrophobic signal.从疏水信号中识别蛋白质序列中的可折叠区域。
Nucleic Acids Res. 2008 Feb;36(2):578-88. doi: 10.1093/nar/gkm1070. Epub 2007 Dec 1.
5
Domain selection combined with improved cloning strategy for high throughput expression of higher eukaryotic proteins.结合改进的克隆策略进行结构域选择以实现高等真核生物蛋白质的高通量表达
BMC Biotechnol. 2007 Jul 30;7:45. doi: 10.1186/1472-6750-7-45.
6
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.用于多基因组中综合直系同源域分类的层次聚类算法。
Nucleic Acids Res. 2006 Jan 25;34(2):647-58. doi: 10.1093/nar/gkj448. Print 2006.
7
eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity.eBLOCKs:枚举保守蛋白质模块以实现最大灵敏度和特异性。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D178-82. doi: 10.1093/nar/gki060.
8
Prediction of protein domain boundaries from sequence alone.仅从序列预测蛋白质结构域边界。
Protein Sci. 2003 Apr;12(4):696-701. doi: 10.1110/ps.0233103.
9
Automated de novo identification of repeat sequence families in sequenced genomes.在已测序基因组中自动从头识别重复序列家族。
Genome Res. 2002 Aug;12(8):1269-76. doi: 10.1101/gr.88502.
10
Tools and resources for identifying protein families, domains and motifs.用于识别蛋白质家族、结构域和基序的工具与资源。
Genome Biol. 2002;3(1):REVIEWS2001. doi: 10.1186/gb-2001-3-1-reviews2001. Epub 2001 Dec 19.