Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California, USA.
The T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, USA.
Protein Sci. 2024 Jun;33(6):e5011. doi: 10.1002/pro.5011.
A protein sequence encodes its energy landscape-all the accessible conformations, energetics, and dynamics. The evolutionary relationship between sequence and landscape can be probed phylogenetically by compiling a multiple sequence alignment of homologous sequences and generating common ancestors via Ancestral Sequence Reconstruction or a consensus protein containing the most common amino acid at each position. Both ancestral and consensus proteins are often more stable than their extant homologs-questioning the differences between them and suggesting that both approaches serve as general methods to engineer thermostability. We used the Ribonuclease H family to compare these approaches and evaluate how the evolutionary relationship of the input sequences affects the properties of the resulting consensus protein. While the consensus protein derived from our full Ribonuclease H sequence alignment is structured and active, it neither shows properties of a well-folded protein nor has enhanced stability. In contrast, the consensus protein derived from a phylogenetically-restricted set of sequences is significantly more stable and cooperatively folded, suggesting that cooperativity may be encoded by different mechanisms in separate clades and lost when too many diverse clades are combined to generate a consensus protein. To explore this, we compared pairwise covariance scores using a Potts formalism as well as higher-order sequence correlations using singular value decomposition (SVD). We find the SVD coordinates of a stable consensus sequence are close to coordinates of the analogous ancestor sequence and its descendants, whereas the unstable consensus sequences are outliers in SVD space.
蛋白质序列编码了其能量景观——所有可及的构象、能量和动力学。序列和景观之间的进化关系可以通过编译同源序列的多重序列比对,通过祖先序列重建或包含每个位置最常见氨基酸的共识蛋白来生成共同祖先来从系统发育上进行探测。祖先和共识蛋白通常比它们现存的同源物更稳定——质疑它们之间的差异,并表明这两种方法都是工程热稳定性的一般方法。我们使用核糖核酸酶 H 家族来比较这些方法,并评估输入序列的进化关系如何影响所得共识蛋白的性质。虽然我们从完整的核糖核酸酶 H 序列比对中得出的共识蛋白是有结构和活性的,但它既没有表现出折叠良好的蛋白质的性质,也没有增强的稳定性。相比之下,从系统发育上受限制的一组序列中得出的共识蛋白的稳定性显著增强且协同折叠,这表明协同性可能由不同的机制在不同的分支中编码,当组合太多不同的分支来生成共识蛋白时,协同性就会丢失。为了探索这一点,我们使用 Potts 形式主义比较了成对协方差得分,以及使用奇异值分解 (SVD) 的高阶序列相关性。我们发现稳定共识序列的 SVD 坐标接近类似的祖先序列及其后代的坐标,而不稳定的共识序列在 SVD 空间中是异常值。