Ferrari Állan J R, Dixit Sugyan M, Thibeault Jane, Garcia Mario, Houliston Scott, Ludwig Robert W, Notin Pascal, Phoumyvong Claire M, Martell Cydney M, Jung Michelle D, Tsuboyama Kotaro, Carter Lauren, Arrowsmith Cheryl H, Guttman Miklos, Rocklin Gabriel J
Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
Structural Genomics Consortium, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Centre, University of Toronto, Toronto, ON M5G 2M9, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 2M9, Canada.
bioRxiv. 2025 Mar 25:2025.03.20.644235. doi: 10.1101/2025.03.20.644235.
All folded proteins continuously fluctuate between their low-energy native structures and higher energy conformations that can be partially or fully unfolded. These rare states influence protein function, interactions, aggregation, and immunogenicity, yet they remain far less understood than protein native states. Although native protein structures are now often predictable with impressive accuracy, conformational fluctuations and their energies remain largely invisible and unpredictable, and experimental challenges have prevented large-scale measurements that could improve machine learning and physics-based modeling. Here, we introduce a multiplexed experimental approach to analyze the energies of conformational fluctuations for hundreds of protein domains in parallel using intact protein hydrogen-deuterium exchange mass spectrometry. We analyzed 5,778 domains 28-64 amino acids in length, revealing hidden variation in conformational fluctuations even between sequences sharing the same fold and global folding stability. Site-resolved hydrogen exchange NMR analysis of 13 domains showed that these fluctuations often involve entire secondary structural elements with lower stability than the overall fold. Computational modeling of our domains identified structural features that correlated with the experimentally observed fluctuations, enabling us to design mutations that stabilized low-stability structural segments. Our dataset enables new machine learning-based analysis of protein energy landscapes, and our experimental approach promises to reveal these landscapes at unprecedented scale.
所有折叠的蛋白质都在其低能量天然结构和可部分或完全展开的高能量构象之间持续波动。这些罕见状态影响蛋白质的功能、相互作用、聚集和免疫原性,但与蛋白质天然状态相比,人们对它们的了解仍然少得多。尽管现在通常可以以令人印象深刻的准确度预测天然蛋白质结构,但构象波动及其能量在很大程度上仍然不可见且不可预测,并且实验挑战阻碍了能够改进机器学习和基于物理的建模的大规模测量。在这里,我们介绍一种多重实验方法,使用完整蛋白质氢-氘交换质谱法并行分析数百个蛋白质结构域的构象波动能量。我们分析了5778个长度为28-64个氨基酸的结构域,揭示了即使在具有相同折叠和整体折叠稳定性的序列之间,构象波动中也存在隐藏的差异。对13个结构域的位点解析氢交换核磁共振分析表明,这些波动通常涉及整个二级结构元件,其稳定性低于整体折叠。我们结构域的计算建模确定了与实验观察到的波动相关的结构特征,使我们能够设计稳定低稳定性结构片段的突变。我们的数据集能够对蛋白质能量景观进行基于机器学习的新分析,我们的实验方法有望以前所未有的规模揭示这些景观。