• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质结构域架构的语法。

Grammar of protein domain architectures.

机构信息

Department of Pathology, University of Alabama, Birmingham, AL 35249.

National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894.

出版信息

Proc Natl Acad Sci U S A. 2019 Feb 26;116(9):3636-3645. doi: 10.1073/pnas.1814684116. Epub 2019 Feb 7.

DOI:10.1073/pnas.1814684116
PMID:30733291
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6397568/
Abstract

From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, -gram analysis, to probe the "proteome grammar"-that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of "protein languages" in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a "quasi-universal grammar" underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.

摘要

从抽象的信息角度来看,蛋白质结构域类似于自然语言中的单词,其中单词的组合规则由语言规则或语法决定。蛋白质结构域也存在这样的规则,因为在进化过程中只有一小部分可能的结构域组合是可行的。我们采用了一种流行的语言学技术,-gram 分析,来探究“蛋白质组语法”,即生成各种蛋白质结构域架构的结构域组合规则。比较生命主要分支中“蛋白质语言”的复杂度度量表明,观察到的结构域架构与随机结构域组合之间的相对熵差异(信息增益)在进化中高度保守,接近普遍常数,约为 1.2 位。只有在两个主要的生物群中观察到与这个常数的实质性偏差:一组似乎简化到极限的古细菌,以及表现出极端复杂性的动物。我们还确定了代表细胞生命主要分支的特征 gram。这种分析的结果支持了基因组和自然语言之间的类比,并表明“准通用语法”是所有细胞生命领域结构域架构进化的基础。结构域架构的信息增益几乎普遍的值可能反映了维持功能细胞所需的信号处理的最小复杂度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/f4f99bfec963/pnas.1814684116fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/5e8f3c4f6ab3/pnas.1814684116fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/decf17615933/pnas.1814684116fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/609f68c1add0/pnas.1814684116fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/6c4324eef269/pnas.1814684116fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/f4f99bfec963/pnas.1814684116fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/5e8f3c4f6ab3/pnas.1814684116fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/decf17615933/pnas.1814684116fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/609f68c1add0/pnas.1814684116fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/6c4324eef269/pnas.1814684116fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/128a/6397568/f4f99bfec963/pnas.1814684116fig05.jpg

相似文献

1
Grammar of protein domain architectures.蛋白质结构域架构的语法。
Proc Natl Acad Sci U S A. 2019 Feb 26;116(9):3636-3645. doi: 10.1073/pnas.1814684116. Epub 2019 Feb 7.
2
Domain combinations in archaeal, eubacterial and eukaryotic proteomes.古菌、真细菌和真核生物蛋白质组中的结构域组合
J Mol Biol. 2001 Jul 6;310(2):311-25. doi: 10.1006/jmbi.2001.4776.
3
Modeling the evolution of protein domain architectures using maximum parsimony.使用最大简约法模拟蛋白质结构域架构的演变。
J Mol Biol. 2007 Feb 9;366(1):307-15. doi: 10.1016/j.jmb.2006.11.017. Epub 2006 Nov 10.
4
Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems.古菌双组分信号转导系统的系统发生分布和谱系特异性结构域结构。
J Bacteriol. 2018 Mar 12;200(7). doi: 10.1128/JB.00681-17. Print 2018 Apr 1.
5
Global phylogeny determined by the combination of protein domains in proteomes.由蛋白质组中蛋白质结构域组合所确定的全球系统发育。
Mol Biol Evol. 2006 Dec;23(12):2444-54. doi: 10.1093/molbev/msl117. Epub 2006 Sep 13.
6
A Dynamic Model for the Evolution of Protein Structure.蛋白质结构演化的动态模型。
J Mol Evol. 2016 May;82(4-5):230-43. doi: 10.1007/s00239-016-9740-1. Epub 2016 May 5.
7
The proteomic complexity and rise of the primordial ancestor of diversified life.蛋白质组的复杂性和多样化生命原始祖先的兴起。
BMC Evol Biol. 2011 May 25;11:140. doi: 10.1186/1471-2148-11-140.
8
Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes.从蛋白质组中 CATH 结构域的系统发生基因组分析推断蛋白质折叠设计的起源和进化。
PLoS Comput Biol. 2013;9(3):e1003009. doi: 10.1371/journal.pcbi.1003009. Epub 2013 Mar 28.
9
Evolution of Protein Domain Architectures.蛋白质结构域架构的演变
Methods Mol Biol. 2019;1910:469-504. doi: 10.1007/978-1-4939-9074-0_15.
10
Function-selective domain architecture plasticity potentials in eukaryotic genome evolution.真核生物基因组进化中功能选择性结构域架构可塑性潜力
Biochimie. 2015 Dec;119:269-77. doi: 10.1016/j.biochi.2015.05.003. Epub 2015 May 15.

引用本文的文献

1
Signal peptides restrict genome evolution and A-to-I RNA editing.信号肽限制基因组进化和A到I的RNA编辑。
NAR Genom Bioinform. 2025 Jul 11;7(3):lqaf096. doi: 10.1093/nargab/lqaf096. eCollection 2025 Sep.
2
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于蛋白质-配体相互作用研究的自然语言处理方法
J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.
3
Peptide Inhibitor Assay for Allocating Functionally Important Accessible Sites Throughout a Protein Chain: Restriction Endonuclease EcoRI as a Model Protein System.

本文引用的文献

1
Unique function words characterize genomic proteins.具有独特功能的词是基因组蛋白的特征。
Proc Natl Acad Sci U S A. 2018 Jun 26;115(26):6703-6708. doi: 10.1073/pnas.1801182115. Epub 2018 Jun 12.
2
mixOmics: An R package for 'omics feature selection and multiple data integration.mixOmics:一个用于“组学”特征选择和多数据整合的R包。
PLoS Comput Biol. 2017 Nov 3;13(11):e1005752. doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.
3
Identifying the missing proteins in human proteome by biological language model.利用生物语言模型识别人类蛋白质组中缺失的蛋白质。
用于在整个蛋白质链中定位功能重要可及位点的肽抑制剂分析:限制性内切核酸酶EcoRI作为模型蛋白质系统
BioTech (Basel). 2024 Dec 30;14(1):1. doi: 10.3390/biotech14010001.
4
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.
5
DPFunc: accurately predicting protein function via deep learning with domain-guided structure information.DPFunc:利用域引导的结构信息通过深度学习准确预测蛋白质功能。
Nat Commun. 2025 Jan 2;16(1):70. doi: 10.1038/s41467-024-54816-8.
6
Linguistic networks uncover grammatical constraints of protein sentences comprised of domain-based words.语言网络揭示了由基于结构域的词汇组成的蛋白质句子的语法限制。
bioRxiv. 2024 Dec 6:2024.12.04.626803. doi: 10.1101/2024.12.04.626803.
7
Beta Sandwich-Like Folds: Sequences, Contacts, Classification of Invariant Substructures and Beta Sandwich Protein Grammar.β三明治样折叠:序列、接触、不变子结构分类和β三明治蛋白语法。
Methods Mol Biol. 2025;2870:51-62. doi: 10.1007/978-1-0716-4213-9_4.
8
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于研究蛋白质-配体相互作用的自然语言处理方法
ArXiv. 2024 Oct 17:arXiv:2409.13057v2.
9
Protein domain embeddings for fast and accurate similarity search.蛋白质结构域嵌入用于快速准确的相似性搜索。
Genome Res. 2024 Oct 11;34(9):1434-1444. doi: 10.1101/gr.279127.124.
10
Clustering protein functional families at large scale with hierarchical approaches.大规模使用层次方法对蛋白质功能家族进行聚类。
Protein Sci. 2024 Sep;33(9):e5140. doi: 10.1002/pro.5140.
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):113. doi: 10.1186/s12918-016-0352-6.
4
The NBS-LRR architectures of plant R-proteins and metazoan NLRs evolved in independent events.植物R蛋白和后生动物NLRs的NBS-LRR结构是在独立事件中进化而来的。
Proc Natl Acad Sci U S A. 2017 Jan 31;114(5):1063-1068. doi: 10.1073/pnas.1619730114. Epub 2017 Jan 17.
5
UniProt: the universal protein knowledgebase.通用蛋白质知识库:UniProt
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.
6
Early evolution of polyisoprenol biosynthesis and the origin of cell walls.聚异戊二烯醇生物合成的早期进化与细胞壁的起源
PeerJ. 2016 Oct 26;4:e2626. doi: 10.7717/peerj.2626. eCollection 2016.
7
Unity and disunity in evolutionary sciences: process-based analogies open common research avenues for biology and linguistics.进化科学中的统一与不统一:基于过程的类比为生物学和语言学开辟了共同的研究途径。
Biol Direct. 2016 Aug 20;11:39. doi: 10.1186/s13062-016-0145-2.
8
The language of the protein universe.蛋白质世界的语言。
Curr Opin Genet Dev. 2015 Dec;35:50-6. doi: 10.1016/j.gde.2015.08.010. Epub 2015 Nov 3.
9
Origins of major archaeal clades correspond to gene acquisitions from bacteria.主要古菌分支的起源与从细菌获得的基因相对应。
Nature. 2015 Jan 1;517(7532):77-80. doi: 10.1038/nature13805. Epub 2014 Oct 15.
10
Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。
PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.