文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

用于结构基因组学的蛋白质家族聚类

Protein family clustering for structural genomics.

作者信息

Yan Yongpan, Moult John

机构信息

Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, Rockville, MD 20850, USA.

出版信息

J Mol Biol. 2005 Oct 28;353(3):744-59. doi: 10.1016/j.jmb.2005.08.058. Epub 2005 Sep 9.


DOI:10.1016/j.jmb.2005.08.058
PMID:16185712
Abstract

A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.

摘要

结构基因组学的一个主要目标是为大部分蛋白质结构域提供一个结构模板。这项任务的规模取决于蛋白质序列家族的数量和性质。随着大量细菌基因组现已完全测序,有可能对该领域家族的数量和多样性获得更准确的估计。我们使用了一种自动聚类程序,将一组基因组中的所有序列分组为蛋白质家族。基准测试表明,该聚类方法在检测远亲家族成员方面很敏感,且假阳性水平较低。这个全面的蛋白质家族集已被用于解决以下问题。(1)目前已知家族的结构覆盖率是多少?(2)随着更多基因组被测序,已知明显家族的数量将如何增长?(3)未来最大化结构覆盖率的实际策略是什么?我们的研究表明,目前约20%有三个或更多成员的已知家族有代表性结构。该研究还表明,明显的蛋白质家族数量将比以前认为的大得多:我们估计,按照这项工作的标准,当1000个微生物基因组被测序时,将有大约250,000个蛋白质家族。然而,这些家族中的绝大多数将很小,通过系统地对较大的家族进行采样,有可能用可实现数量的代表性结构获得70 - 80%蛋白质结构域的结构模板。

相似文献

[1]
Protein family clustering for structural genomics.

J Mol Biol. 2005-10-28

[2]
Progress of structural genomics initiatives: an analysis of solved target structures.

J Mol Biol. 2005-5-20

[3]
Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization.

Proteins. 2007-3-1

[4]
Defining the fold space of membrane proteins: the CAMPS database.

Proteins. 2006-9-1

[5]
Identification and distribution of protein families in 120 completed genomes using Gene3D.

Proteins. 2005-5-15

[6]
An overview of structural genomics.

Nat Struct Biol. 2000-11

[7]
The number of protein folds and their distribution over families in nature.

Proteins. 2004-2-15

[8]
Solution NMR in structural genomics.

Curr Opin Struct Biol. 2006-10

[9]
Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

Proteins. 2008-9

[10]
[Development of antituberculous drugs: current status and future prospects].

Kekkaku. 2006-12

引用本文的文献

[1]
On the origin of protein superfamilies and superfolds.

Sci Rep. 2015-2-23

[2]
An estimated 5% of new protein structures solved today represent a new Pfam family.

Acta Crystallogr D Biol Crystallogr. 2013-11

[3]
Composition bias and the origin of ORFan genes.

Bioinformatics. 2010-3-15

[4]
Characterization of proteins with wide-angle X-ray solution scattering (WAXS).

J Struct Funct Genomics. 2010-3

[5]
Structural genomics: keeping up with expanding knowledge of the protein universe.

Curr Opin Struct Biol. 2007-6

[6]
Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint.

BMC Bioinformatics. 2007-3-9

[7]
Using phylogeny to improve genome-wide distant homology recognition.

PLoS Comput Biol. 2007-1-19

[8]
A limited universe of membrane protein families and folds.

Protein Sci. 2006-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索