Suppr超能文献

大规模实验分析生物学和设计中的蛋白质折叠稳定性。

Mega-scale experimental analysis of protein folding stability in biology and design.

机构信息

Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Center for Synthetic Biology, Northwestern University, Evanston, IL, USA.

出版信息

Nature. 2023 Aug;620(7973):434-444. doi: 10.1038/s41586-023-06328-6. Epub 2023 Jul 19.

Abstract

Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale. However, the energetics driving folding are invisible in these structures and remain largely unknown. The hidden thermodynamics of folding can drive disease, shape protein evolution and guide protein engineering, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.

摘要

DNA 测序和机器学习的进步正在大规模地提供对蛋白质序列和结构的深入了解。然而,这些结构中无法看到驱动折叠的能量学,并且这些能量学在很大程度上仍然未知。折叠的隐藏热力学可以引发疾病、塑造蛋白质进化并指导蛋白质工程,因此需要新的方法来揭示每个序列和结构的这些热力学。在这里,我们提出了 cDNA 显示蛋白水解,这是一种在一周的实验中测量多达 90 万个蛋白质结构域热力学折叠稳定性的方法。通过总共 180 万次测量,我们整理了大约 776,000 个高质量的折叠稳定性数据集,涵盖了所有单个氨基酸变体和 331 个天然和 148 个从头设计的蛋白质结构域的 40-72 个氨基酸的选择双突变体。使用这个广泛的数据集,我们量化了:(1) 影响氨基酸适应性的环境因素,(2) 蛋白质位点之间的热力学耦合(包括意外的相互作用),以及 (3) 进化氨基酸使用和蛋白质折叠稳定性之间的全球差异。我们还研究了我们的方法如何识别设计蛋白质中的稳定性决定因素并评估设计方法。cDNA 显示蛋白水解方法快速、准确且具有独特的可扩展性,有望揭示氨基酸序列如何编码折叠稳定性的定量规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/914a/10412457/5f90b9e59a9a/41586_2023_6328_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验