• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然界中蛋白质折叠的数量及其在各家族中的分布。

The number of protein folds and their distribution over families in nature.

作者信息

Liu Xinsheng, Fan Ke, Wang Wei

机构信息

National Lab of Solid State Microstructure, Department of Physics and Institute of Biophysics, Nanjing University, Nanjing, China.

出版信息

Proteins. 2004 Feb 15;54(3):491-9. doi: 10.1002/prot.10514.

DOI:10.1002/prot.10514
PMID:14747997
Abstract

Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.

摘要

目前,在已知的10⁶个蛋白质序列中,仅有约10⁴个蛋白质的结构已被解析。基于同源性和相似性,蛋白质被归为不同的家族,每个家族都有一个结构原型,即折叠,并且有些家族共享相同的折叠。然而,折叠和家族的总数,以及自然界中折叠在家族间的分布情况,仍然是个谜。在此,我们运用最大概率原理和矩估计方法,报告一项关于自然界中折叠在家族间的分布以及折叠总数的研究。对于家族数量在6000至30000区间内的情况,我们发现家族数量与折叠数量之间存在二次关系。例如,对于23100个家族,我们得到了约2700种折叠,其中约有33种超级折叠,每个超级折叠包含100多个家族,最大的超级折叠包含约800个家族。我们的结果表明,尽管大多数折叠每个仅对应一个家族,但有相当数量的折叠所包含的家族数量比数据库中的要多得多,并且自然界中折叠在家族间的分布与抽样分布明显不同。本文首次估计了折叠分布的长尾情况。结果与不同版本的蛋白质结构分类(SCOP)数据拟合得非常好,拟合优度检验有力地支持了这些结果。此外,将样本直接“扩展”到总体的方法可能有助于推断不同领域中物种的分布情况。

相似文献

1
The number of protein folds and their distribution over families in nature.自然界中蛋白质折叠的数量及其在各家族中的分布。
Proteins. 2004 Feb 15;54(3):491-9. doi: 10.1002/prot.10514.
2
The size distribution of protein families within different types of folds.不同折叠类型中蛋白质家族的大小分布。
Biochem Biophys Res Commun. 2011 Mar 11;406(2):218-22. doi: 10.1016/j.bbrc.2011.02.020. Epub 2011 Feb 15.
3
Monte Carlo estimation of the number of possible protein folds: effects of sampling bias and folds distributions.蛋白质可能折叠数的蒙特卡罗估计:抽样偏差和折叠分布的影响。
Proteins. 2003 May 15;51(3):352-9. doi: 10.1002/prot.10336.
4
Protein structural domains: analysis of the 3Dee domains database.蛋白质结构域:3Dee结构域数据库分析
Proteins. 2001 Feb 15;42(3):332-44.
5
Estimating the total number of protein folds.估算蛋白质折叠的总数。
Proteins. 1999 Jun 1;35(4):408-14.
6
Fold usage on genomes and protein fold evolution.基因组上的折叠用法与蛋白质折叠进化。
Proteins. 2005 Sep 1;60(4):690-700. doi: 10.1002/prot.20506.
7
Protein superfamilies and domain superfolds.蛋白质超家族和结构域超折叠。
Nature. 1994 Dec 15;372(6507):631-4. doi: 10.1038/372631a0.
8
The structure of the protein universe and genome evolution.蛋白质宇宙的结构与基因组进化。
Nature. 2002 Nov 14;420(6912):218-23. doi: 10.1038/nature01256.
9
Estimating the number of protein folds.估算蛋白质折叠的数量。
J Mol Biol. 1998 Dec 18;284(5):1301-5. doi: 10.1006/jmbi.1998.2282.
10
Comparison of sequence and structure-based datasets for nonredundant structural data mining.用于非冗余结构数据挖掘的基于序列和结构的数据集比较。
Proteins. 2005 Sep 1;60(4):577-83. doi: 10.1002/prot.20505.

引用本文的文献

1
A Fully In Silico Protocol to Understand Olfactory Receptor-Odorant Interactions.一种用于理解嗅觉受体与气味剂相互作用的全计算机模拟方案。
ACS Omega. 2025 Jun 3;10(23):24030-24049. doi: 10.1021/acsomega.4c08181. eCollection 2025 Jun 17.
2
Analysis of proteins in the light of mutations.根据突变分析蛋白质。
Eur Biophys J. 2024 Aug;53(5-6):255-265. doi: 10.1007/s00249-024-01714-y. Epub 2024 Jul 2.
3
Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.
在蛋白质家族水平上探索微生物功能多样性——从宏基因组序列 reads 到注释的蛋白质簇。
Front Bioinform. 2023 Mar 3;3:1157956. doi: 10.3389/fbinf.2023.1157956. eCollection 2023.
4
Binding and Functional Folding (BFF): A Physiological Framework for Studying Biomolecular Interactions and Allostery.结合与功能折叠(BFF):研究生物分子相互作用和变构的生理框架。
J Mol Biol. 2022 Dec 15;434(23):167872. doi: 10.1016/j.jmb.2022.167872. Epub 2022 Oct 28.
5
The hypothetical periplasmic protein PA1624 from Pseudomonas aeruginosa folds into a unique two-domain structure.铜绿假单胞菌的假定周质蛋白 PA1624 折叠成独特的两域结构。
Acta Crystallogr F Struct Biol Commun. 2020 Dec 1;76(Pt 12):609-615. doi: 10.1107/S2053230X20014612. Epub 2020 Nov 30.
6
Conceptual Evolution of Cell Signaling.细胞信号转导的概念演变。
Int J Mol Sci. 2019 Jul 4;20(13):3292. doi: 10.3390/ijms20133292.
7
A global map of the protein shape universe.蛋白质形状宇宙的全球图谱。
PLoS Comput Biol. 2019 Apr 12;15(4):e1006969. doi: 10.1371/journal.pcbi.1006969. eCollection 2019 Apr.
8
Conformational diversity analysis reveals three functional mechanisms in proteins.构象多样性分析揭示了蛋白质中的三种功能机制。
PLoS Comput Biol. 2017 Feb 13;13(2):e1005398. doi: 10.1371/journal.pcbi.1005398. eCollection 2017 Feb.
9
On the origin of protein superfamilies and superfolds.论蛋白质超家族和超折叠的起源。
Sci Rep. 2015 Feb 23;5:8166. doi: 10.1038/srep08166.
10
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.蛋白质宇宙结构覆盖范围的趋势以及蛋白质结构计划的影响。
Proc Natl Acad Sci U S A. 2014 Mar 11;111(10):3733-8. doi: 10.1073/pnas.1321614111. Epub 2014 Feb 24.