Protein Physiology Lab, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Universidad de Buenos Aires, C1428EGA Buenos Aires, Argentina.
Consejo Nacional de Investigaciones Científicas y Técnicas-Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, C1428EGA Buenos Aires, Argentina.
Proc Natl Acad Sci U S A. 2022 Aug 2;119(31):e2204131119. doi: 10.1073/pnas.2204131119. Epub 2022 Jul 29.
Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores.
重复蛋白由相似氨基酸序列的串联拷贝组成,这些拷贝折叠成伸长的结构。这些蛋白质构成了极好的模型系统,可以研究进化与结构、折叠和功能的关系。在这里,我们提出了一种将序列水平上的进化信息映射到重复蛋白折叠的粗粒度模型的方案,并利用它来研究数千种重复蛋白的折叠。我们通过将逆 Potts 模型方案与重复的重复和缺失的显式机制模型相结合来模拟能量,以计算系统在单残基水平上的进化参数。这些参数用于通知类似于伊辛的模型,该模型允许生成折叠曲线、明显的结构域出现和中间状态的占据,这些状态在特定案例研究中与实验数据高度兼容。我们分析了数千种天然锚蛋白重复蛋白的折叠,发现存在多种折叠机制。如果序列相似元素足够且它们之间的相互作用很强,则会获得完全协同的全有或全无跃迁;而如果元素不相似且它们之间的相互作用能量较弱,则会出现非协同的逐个元素间歇性折叠。此外,我们还对成核传播和多结构域折叠机制进行了表征。我们表明,重复阵列的整体稳定性和协同性可以从简单的序列得分来预测。