利用共进化序列信息的统计模型来设计蛋白质。

Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information.

机构信息

The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.

The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.

出版信息

Cold Spring Harb Perspect Biol. 2024 Apr 1;16(4):a041463. doi: 10.1101/cshperspect.a041463.

DOI:10.1101/cshperspect.a041463

PMID:38110247

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10982702/

Abstract

Homologous protein sequences are wonderfully diverse, indicating many possible evolutionary "solutions" to the encoding of function. Consequently, one can construct statistical models of protein sequence by analyzing amino acid frequency across a large multiple sequence alignment. A central premise is that covariance between amino acid positions reflects coevolution due to a shared functional or biophysical constraint. In this review, we describe the implementation and discuss the advantages, limitations, and recent progress on two coevolution-based modeling approaches: (1) Potts models of protein sequence (direct coupling analysis [DCA]-like), and (2) the statistical coupling analysis (SCA). Each approach detects interesting features of protein sequence and structure-the former emphasizes local physical contacts throughout the structure, while the latter identifies larger evolutionarily coupled networks of residues. Recent advances in large-scale gene synthesis and high-throughput functional selection now motivate additional work to benchmark model performance across quantitative function prediction and de novo design tasks.

摘要

同源蛋白质序列非常多样化，这表明在功能编码方面可能存在许多不同的进化“解决方案”。因此，可以通过分析大量多重序列比对中的氨基酸频率来构建蛋白质序列的统计模型。一个核心前提是，氨基酸位置之间的协方差反映了由于共同的功能或物理限制而导致的共同进化。在这篇综述中，我们描述了两种基于共进化的建模方法的实现，并讨论了它们的优势、局限性和最新进展：（1）蛋白质序列的 Potts 模型（类似于直接耦合分析 [DCA]），以及（2）统计耦合分析（SCA）。每种方法都可以检测蛋白质序列和结构的有趣特征——前者强调整个结构中的局部物理接触，而后者则确定更大的进化相关残基网络。大规模基因合成和高通量功能选择的最新进展现在促使人们开展更多的工作，以便在定量功能预测和从头设计任务中对模型性能进行基准测试。

相似文献

Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information.

Cold Spring Harb Perspect Biol. 2024 Apr 1;16(4):a041463. doi: 10.1101/cshperspect.a041463.

Direct coevolutionary couplings reflect biophysical residue interactions in proteins.

J Chem Phys. 2016 Nov 7;145(17):174102. doi: 10.1063/1.4966156.

Coevolution-based inference of amino acid interactions underlying protein function.

Elife. 2018 Jul 20;7:e34300. doi: 10.7554/eLife.34300.

Constructing sequence-dependent protein models using coevolutionary information.

Protein Sci. 2016 Jan;25(1):111-22. doi: 10.1002/pro.2758. Epub 2015 Aug 10.

Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores.

Proteins. 2015 Dec;83(12):2293-306. doi: 10.1002/prot.24948. Epub 2015 Nov 17.

Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. Epub 2011 Nov 21.

Undersampling and the inference of coevolution in proteins.

Cell Syst. 2023 Mar 15;14(3):210-219.e7. doi: 10.1016/j.cels.2022.12.013. Epub 2023 Jan 23.

Sequence coevolution between RNA and protein characterized by mutual information between residue triplets.

PLoS One. 2012;7(1):e30022. doi: 10.1371/journal.pone.0030022. Epub 2012 Jan 18.

J Ind Microbiol Biotechnol. 2017 May;44(4-5):687-695. doi: 10.1007/s10295-016-1811-1. Epub 2016 Aug 11.

Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.

Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11.

引用本文的文献

Considering Metabolic Context in Enzyme Evolution and Design.

Biochemistry. 2025 Aug 19;64(16):3495-3507. doi: 10.1021/acs.biochem.5c00165. Epub 2025 Aug 5.

Using AlphaFold2 to Predict the Conformations of Side Chains in Folded Proteins.

bioRxiv. 2025 Feb 14:2025.02.10.637534. doi: 10.1101/2025.02.10.637534.

Protein stability is determined by single-site bias rather than pairwise covariance.

bioRxiv. 2025 Jan 14:2025.01.09.632118. doi: 10.1101/2025.01.09.632118.

Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors.

Int J Mol Sci. 2024 Jul 30;25(15):8320. doi: 10.3390/ijms25158320.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用共进化序列信息的统计模型来设计蛋白质。

Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献