蛋白质序列与功能关系的简单性。

The simplicity of protein sequence-function relationships.

机构信息

Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA.

Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea.

出版信息

Nat Commun. 2024 Sep 11;15(1):7953. doi: 10.1038/s41467-024-51895-5.

DOI:10.1038/s41467-024-51895-5

PMID:39261454

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11390738/

Abstract

How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.

摘要

蛋白质序列决定其功能的规则有多复杂？残基之间的高阶上位性相互作用被认为是普遍存在的，这表明序列-功能关系是特殊的且不可预测的。但许多先前的研究可能高估了上位性，因为它们相对于单个参考序列来分析序列-功能关系，这会导致测量噪声和局部特殊性累积成高阶上位性，或者它们没有充分考虑全局非线性。在这里，我们提出了一种无参考的方法，该方法使用序列空间的鸟瞰图来共同推断特定的上位性相互作用和全局非线性。该技术提供了序列-功能关系的最简单解释，并且比现有的方法更能抵抗测量噪声、缺失数据和模型失拟。我们重新分析了 20 个实验数据集，发现与上下文无关的氨基酸效应和成对相互作用，以及简单的非线性来解释有限的动态范围，可以解释中位数为 96%的表型方差，在每种情况下都超过 92%。只有一小部分基因型受到高阶上位性的强烈影响。序列-功能关系也是稀疏的：极少数的氨基酸和相互作用占表型方差的 90%。因此，这些数据集的序列-功能因果关系很简单，为可处理的方法来描述蛋白质的遗传结构开辟了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/601a/11390738/cf89ebf50a83/41467_2024_51895_Fig1_HTML.jpg

相似文献

The simplicity of protein sequence-function relationships.

Nat Commun. 2024 Sep 11;15(1):7953. doi: 10.1038/s41467-024-51895-5.

The simplicity of protein sequence-function relationships.

bioRxiv. 2024 Feb 7:2023.09.02.556057. doi: 10.1101/2023.09.02.556057.

The Black Book of Psychotropic Dosing and Monitoring.

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Sexual Harassment and Prevention Training

Short-Term Memory Impairment

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Idiopathic (Genetic) Generalized Epilepsy

Maternal and neonatal outcomes of elective induction of labor.

Evid Rep Technol Assess (Full Rep). 2009 Mar(176):1-257.

引用本文的文献

A combinatorial mutational map of active non-native protein kinases by deep learning guided sequence design.

bioRxiv. 2025 Aug 3:2025.08.03.668353. doi: 10.1101/2025.08.03.668353.

The structure of an ancient genotype-phenotype map shaped the functional evolution of a protein family.

Nat Ecol Evol. 2025 Jul 25. doi: 10.1038/s41559-025-02777-6.

Investigating the determinants of performance in machine learning for protein fitness prediction.

Protein Sci. 2025 Aug;34(8):e70235. doi: 10.1002/pro.70235.

On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing.

bioRxiv. 2025 Jul 11:2025.04.26.650699. doi: 10.1101/2025.04.26.650699.

On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing.

ArXiv. 2025 Jul 11:arXiv:2504.19034v2.

Entropy tree networks of residue dynamics encode protein allostery.

bioRxiv. 2025 May 30:2025.05.28.656549. doi: 10.1101/2025.05.28.656549.

A Thermodynamic Cycle to Predict the Competitive Inhibition Outcomes of an Evolving Enzyme.

J Chem Theory Comput. 2025 May 13;21(9):4910-4920. doi: 10.1021/acs.jctc.5c00193. Epub 2025 Apr 23.

Gauge fixing for sequence-function relationships.

PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.

The structure of an ancient genotype-phenotype map shaped the functional evolution of a protein family.

bioRxiv. 2025 May 2:2025.01.28.635160. doi: 10.1101/2025.01.28.635160.

本文引用的文献

An extension of the Walsh-Hadamard transform to calculate and model epistasis in genetic landscapes of arbitrary shape and complexity.

PLoS Comput Biol. 2024 May 28;20(5):e1012132. doi: 10.1371/journal.pcbi.1012132. eCollection 2024 May.

Epistasis facilitates functional evolution in an ancient transcription factor.

Elife. 2024 May 20;12:RP88737. doi: 10.7554/eLife.88737.

Protein design using structure-based residue preferences.

Nat Commun. 2024 Feb 22;15(1):1639. doi: 10.1038/s41467-024-45621-4.

Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolution.

Nat Commun. 2023 Dec 21;14(1):8508. doi: 10.1038/s41467-023-44333-5.

Hierarchical sequence-affinity landscapes shape the evolution of breadth in an anti-influenza receptor binding site antibody.

Elife. 2023 Jan 10;12:e83628. doi: 10.7554/eLife.83628.

Compensatory epistasis maintains ACE2 affinity in SARS-CoV-2 Omicron BA.1.

Nat Commun. 2022 Nov 16;13(1):7011. doi: 10.1038/s41467-022-34506-z.

Higher-order epistasis and phenotypic prediction.

Proc Natl Acad Sci U S A. 2022 Sep 27;119(39):e2204233119. doi: 10.1073/pnas.2204233119. Epub 2022 Sep 21.

Epistatic drift causes gradual decay of predictability in protein evolution.

Science. 2022 May 20;376(6595):823-830. doi: 10.1126/science.abn6895. Epub 2022 May 19.

Mapping the energetic and allosteric landscapes of protein binding domains.

Nature. 2022 Apr;604(7904):175-183. doi: 10.1038/s41586-022-04586-4. Epub 2022 Apr 6.

On the sparsity of fitness functions and implications for learning.

Proc Natl Acad Sci U S A. 2022 Jan 4;119(1). doi: 10.1073/pnas.2109649118.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质序列与功能关系的简单性。

The simplicity of protein sequence-function relationships.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献