Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02139, USA; Swiss Institute of Bioinformatics (SIB), Switzerland.
Institute for Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland; The Santa Fe Institute, Santa Fe, NM, USA; Swiss Institute of Bioinformatics (SIB), Switzerland; Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, Stellenbosch 7600, South Africa.
J Mol Biol. 2022 Jan 30;434(2):167352. doi: 10.1016/j.jmb.2021.167352. Epub 2021 Nov 10.
More than a hundred proteins in yeast reversibly aggregate and phase-separate in response to various stressors, such as nutrient depletion and heat shock. We know little about the protein sequence and structural features behind this ability, which has not been characterized on a proteome-wide level. To identify the distinctive features of aggregation-prone protein regions, we apply machine learning algorithms to genome-scale limited proteolysis-mass spectrometry (LiP-MS) data from yeast proteins. LiP-MS data reveals that 96 proteins show significant structural changes upon heat shock. We find that in these proteins the propensity to phase separate cannot be solely driven by disordered regions, because their aggregation-prone regions (APRs) are not significantly disordered. Instead, the phase separation of these proteins requires contributions from both disordered and structured regions. APRs are significantly enriched in aliphatic residues and depleted in positively charged amino acids. Aggregator proteins with longer APRs show a greater propensity to aggregate, a relationship that can be explained by equilibrium statistical thermodynamics. Altogether, our observations suggest that proteome-wide reversible protein aggregation is mediated by sequence-encoded properties. We propose that aggregating proteins resemble supra-molecular amphiphiles, where APRs are the hydrophobic parts, and non-APRs are the hydrophilic parts.
酵母中有一百多种蛋白质可以在各种应激条件下(如营养物质耗尽和热休克)发生可逆聚集和相分离。我们对这种能力背后的蛋白质序列和结构特征知之甚少,而且这种能力还没有在全蛋白质组水平上进行过特征描述。为了识别易于聚集的蛋白质区域的独特特征,我们将机器学习算法应用于来自酵母蛋白质的基于基因组规模的有限蛋白酶解-质谱(LiP-MS)数据。LiP-MS 数据表明,96 种蛋白质在热休克时会发生显著的结构变化。我们发现,在这些蛋白质中,相分离的倾向不能仅仅由无序区域驱动,因为它们的易于聚集的区域(APRs)并没有显著的无序性。相反,这些蛋白质的相分离需要无序和结构区域的共同贡献。APRs 在脂肪族残基中显著富集,在带正电荷的氨基酸中显著缺失。具有较长 APRs 的聚集蛋白表现出更强的聚集倾向,这种关系可以用平衡统计热力学来解释。总之,我们的观察结果表明,全蛋白质组范围的可逆蛋白质聚集是由序列编码特性介导的。我们提出,聚集蛋白类似于超分子两亲物,其中 APR 是疏水区,非 APR 是亲水区。