Dubreuil Benjamin, Levy Emmanuel D
Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
Front Mol Biosci. 2021 Apr 30;8:626729. doi: 10.3389/fmolb.2021.626729. eCollection 2021.
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein's abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
理解塑造蛋白质保守性的力量至关重要,这不仅是因为它所代表的基础知识,还能在实际应用中优化进化信息的使用。序列保守性通常在两个层面之一进行研究。第一个层面是残基水平,分析蛋白质内部的差异;第二个层面是蛋白质水平,研究蛋白质之间的差异。在残基水平上,我们知道溶剂可及性是保守性的主要决定因素。通过反转这一逻辑,我们推断无序区域平均而言比结构域中最暴露的表面残基更易被溶剂接触。通过整合蛋白质内部和跨蛋白质的丰度信息与进化数据,我们证实了先前报道的结构化区域进化中强烈的表面 - 核心关联,但我们发现无序区域和结构化区域之间的关联相对较弱。无序区域和结构化区域经历不同的结构限制且独立进化这一事实,为研究一个突出问题提供了独特的背景:为什么蛋白质的丰度是其序列保守性的主要决定因素?实际上,任何与丰度 - 保守性关系相关的结构或生物物理特性都应增加与该特性相关区域(例如,具有错误相互作用的无序残基、具有错误折叠的结构域残基)的相对保守性。然而,令人惊讶的是,我们发现无序区域和结构化区域的保守性与丰度成比例增加。这一观察结果意味着要么与丰度相关的限制与结构无关,要么多种限制适用于不同区域并完美地相互平衡。