• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型蛋白质家族的内部组织:基于序列、结构和功能聚类的关系。

Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering.

机构信息

Joint Center for Structural Genomics, Center for Research in Biological Systems, University of California, San Diego, California 92093-0446, USA.

出版信息

Proteins. 2011 Aug;79(8):2389-402. doi: 10.1002/prot.23049. Epub 2011 May 31.

DOI:10.1002/prot.23049
PMID:21671455
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3132221/
Abstract

The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.

摘要

蛋白质的世界可以按照共享共同祖先的蛋白质进行分类,组成蛋白质家族。这些家族在结构和功能上的分化程度各有不同,从所有成员具有相同功能和非常相似结构的同型家族,到功能和结构差异巨大的异型家族。为了进行结构和功能预测,将具有高度相似结构(同型结构)和/或功能(同型功能)的蛋白质亚群进行分类,这对我们来说是非常有益的。我们比较了三种算法在聚类大型蛋白质家族方面的能力,并讨论了这些方法是否能够可靠地识别这些同型结构或同型功能的蛋白质家族。结果表明,使用序列轮廓和轮廓轮廓比较方法进行聚类,能够很好地重现基于 3D 结构相似性的聚类或具有相似生物学功能的蛋白质聚类。相比之下,目前仍在广泛使用的基于序列且带有固定阈值的方法会极大地高估蛋白质家族的结构和功能多样性。因此,这些方法还高估了为充分描述此类家族的结构空间而必须确定的蛋白质结构数量。事实上,人们可以根据明显不相关的模板构建可靠的模型,这对于从新的测序项目中提取最大信息量至关重要。

相似文献

1
Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering.大型蛋白质家族的内部组织:基于序列、结构和功能聚类的关系。
Proteins. 2011 Aug;79(8):2389-402. doi: 10.1002/prot.23049. Epub 2011 May 31.
2
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
3
Automatic classification of protein structures relying on similarities between alignments.基于比对间相似性的蛋白质结构自动分类。
BMC Bioinformatics. 2012 Sep 14;13:233. doi: 10.1186/1471-2105-13-233.
4
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
5
Clustering of proximal sequence space for the identification of protein families.用于识别蛋白质家族的近端序列空间聚类
Bioinformatics. 2002 Jul;18(7):908-21. doi: 10.1093/bioinformatics/18.7.908.
6
Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction.分析具有应用于蛋白质功能预测的酶蛋白家族的亚结构变异。
BMC Bioinformatics. 2010 May 11;11:242. doi: 10.1186/1471-2105-11-242.
7
Clustering predicted structures at the scale of the known protein universe.对已知蛋白质宇宙尺度的预测结构进行聚类。
Nature. 2023 Oct;622(7983):637-645. doi: 10.1038/s41586-023-06510-w. Epub 2023 Sep 13.
8
PASS2: an automated database of protein alignments organised as structural superfamilies.PASS2:一个以结构超家族形式组织的蛋白质比对自动化数据库。
BMC Bioinformatics. 2004 Apr 2;5:35. doi: 10.1186/1471-2105-5-35.
9
A hybrid clustering approach to recognition of protein families in 114 microbial genomes.一种用于识别114个微生物基因组中蛋白质家族的混合聚类方法。
BMC Bioinformatics. 2004 Apr 29;5:45. doi: 10.1186/1471-2105-5-45.
10
Identification of subfamily-specific sites based on active sites modeling and clustering.基于活性位点建模和聚类识别亚家族特异性位点。
Bioinformatics. 2010 Dec 15;26(24):3075-82. doi: 10.1093/bioinformatics/btq595. Epub 2010 Oct 26.

引用本文的文献

1
Recent Developments and Applications of Biocatalytic and Chemoenzymatic Synthesis for the Generation of Diverse Classes of Drugs.生物催化和化学酶法合成在生成各类药物方面的最新进展和应用。
Curr Pharm Biotechnol. 2024;25(4):448-467. doi: 10.2174/0113892010238984231019085154.
2
Enabling Broader Adoption of Biocatalysis in Organic Chemistry.推动生物催化在有机化学中的更广泛应用。
JACS Au. 2023 Jul 19;3(8):2073-2085. doi: 10.1021/jacsau.3c00263. eCollection 2023 Aug 28.
3
State-of-the-Art Biocatalysis.最新技术水平的生物催化
ACS Cent Sci. 2021 Jul 28;7(7):1105-1116. doi: 10.1021/acscentsci.1c00273. Epub 2021 Jun 25.
4
Basis for substrate recognition and distinction by matrix metalloproteinases.基质金属蛋白酶对底物的识别与区分基础。
Proc Natl Acad Sci U S A. 2014 Oct 7;111(40):E4148-55. doi: 10.1073/pnas.1406134111. Epub 2014 Sep 22.
5
Evolutionary dynamics on protein bi-stability landscapes can potentially resolve adaptive conflicts.蛋白质双稳态景观上的进化动力学可能潜在地解决适应性冲突。
PLoS Comput Biol. 2012;8(9):e1002659. doi: 10.1371/journal.pcbi.1002659. Epub 2012 Sep 13.

本文引用的文献

1
Predicting protein structures with a multiplayer online game.用多人在线游戏预测蛋白质结构。
Nature. 2010 Aug 5;466(7307):756-60. doi: 10.1038/nature09304.
2
The CATH database.CATH 数据库。
Hum Genomics. 2010 Feb;4(3):207-12. doi: 10.1186/1479-7364-4-3-207.
3
The Pfam protein families database.Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.
4
Global distribution of conformational states derived from redundant models in the PDB points to non-uniqueness of the protein structure.蛋白质数据银行(PDB)中冗余模型所衍生的构象状态的全球分布表明蛋白质结构并非唯一。
Proc Natl Acad Sci U S A. 2009 Jun 30;106(26):10505-10. doi: 10.1073/pnas.0812152106. Epub 2009 Jun 24.
5
PSI-2: structural genomics to cover protein domain family space.PSI-2:用于覆盖蛋白质结构域家族空间的结构基因组学。
Structure. 2009 Jun 10;17(6):869-81. doi: 10.1016/j.str.2009.03.015.
6
Structural genomics is the largest contributor of novel structural leverage.结构基因组学是新型结构杠杆作用的最大贡献者。
J Struct Funct Genomics. 2009 Apr;10(2):181-91. doi: 10.1007/s10969-008-9055-6. Epub 2009 Feb 5.
7
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies.重温CATH分类——超家族中结构差异的架构综述及新表征方法
Nucleic Acids Res. 2009 Jan;37(Database issue):D310-4. doi: 10.1093/nar/gkn877. Epub 2008 Nov 7.
8
Protein structure modeling with MODELLER.使用MODELLER进行蛋白质结构建模。
Methods Mol Biol. 2008;426:145-59. doi: 10.1007/978-1-60327-058-8_8.
9
Macromolecular modeling with rosetta.使用Rosetta进行大分子建模。
Annu Rev Biochem. 2008;77:363-82. doi: 10.1146/annurev.biochem.77.062906.171838.
10
Contributions to the NIH-NIGMS Protein Structure Initiative from the PSI Production Centers.国立卫生研究院-国立综合医学科学研究所蛋白质结构计划中PSI生产中心的贡献。
Structure. 2008 Jan;16(1):5-11. doi: 10.1016/j.str.2007.12.002.